TextWise Booth 1517 Aug 30-Sept 2, 2011
Stop by the TextWise Booth 1517 to see our One-Click Findability App at Dreamforce 2011 in San Francisco. With TextWise semantic search technology, customers can reduce the time a call center agent spends on a call by 25 percent and increase the number of calls deflected from the call center to customer self-service.
TextWise facilitates linking of customer queries to resolutions with its patented contextual approach. Providing context to queries result in more relevant answers to customer questions. Built using Force.com, the social enterprise platform for employee apps, One-Click Findability is immediately available for a test drive and deployment on AppExchange at http://www.salesforce.com/appexchange/.
The TextWise One-Click Findability App is unique in that it supports all verticals and offers access for both call center agents and customer self-service. The app offers improved searching to quickly find resolutions for customers across repositories. It allows for the viewing of result sets from different sources of information, regardless of whether that information is contained within Salesforce knowledge bases or in other repositories. These results can be viewed as federated result sets for unified repositories, or faceted result sets for a single repository. Finally, the One-Click Findability App enables call center agents to specify and access continuously-updated external content from the web through RSS feeds.
Dreamforce is the industry’s leading global cloud computing event. The event is focused on inspiring customer, partner and developer success with social, mobile and open cloud computing. Attendees will learn how to maximize their current investments and explore new offerings across Salesforce Chatter, Database.com, Force.com, Sales Cloud, Service Cloud and more.
Archive for the ‘Uncategorized’ Category
TextWise and Innography® have announced a strategic partnership to incorporate TextWise semantic search into Innography’s intellectual property business intelligence solution. The new functionality available with Innography® Fall ’10™ enables Innography customers to perform contextual semantic search using specific patent numbers or long blocks of text as the query.
“TextWise facilitates near effortless querying for patent searchers and we’re pleased to be working with Innography as the leader in IP business intelligence,” says Connie Kenneally, President of TextWise. “The incorporation of our technology into Innography’s latest release mitigates the need to perform repetitive searches that rely solely on identifying keywords to generate relevant search results. In contrast, our semantic search performs well with very long queries and one does not have to find the perfect keyword combinations to get the most relevant results.”
A TextWise search is performed by incorporating the full context of either a paragraph, claim, abstract or any longer piece of text to generate more relevant matches to similar information contained in the US patent database. “TextWise takes an innovative approach to facilitating rapid identification of patents for monetizing IP assets when coupled with Innography’s intellectual property business intelligence solution,” said Doug Miller, Chief Marketing Officer at Innography. “Our customers have expressed great interest in patent semantic search, so we are very pleased to be offering this new functionality as an option with their Innography subscriptions.”
The target areas where this joint offering would be most advantageous are:
• Expediting products for clearance from infringing on other offerings
• Rapid idea screening for patenting
• Lead identification for patent licensing, sales and patent acquisitions
• NPE defense / invalidity protection
A demonstration of Innography Fall ’10™, including the TextWise patent semantic search feature, can be found at: http://innography.com/assets/files/fall10demo/fall10-demo.html.
About Innography
Innography® delivers a comprehensive, online Intellectual Property Business Intelligence (IPBI) application that enables companies of all types and sizes to achieve the optimal return on their IP investments. By correlating patent and trademark data with financial, litigation and other key business information, Innography instantly generates a variety of unique visualizations to help organizations reduce the time it takes to perform IP research and reduce associated legal expenditures. This enables corporations to get products to market faster, uncover new and more lucrative revenue sources, keep better track of competitors, manage litigation claims, and stay on top of additional IP-associated functions. Visit www.innography.com to view a brief online product demo or call 1.512.306.8688 for more information.
Last month Anthony Vito presented a 5 minute ‘lightning talk’ on our implementation of Semantic Signatures using the SemanticHacker API for CRM. Specifically this was an example using Salesforce.com.
Textwise lightning talk, Smart Content Anthony Vito from Seth Grimes on Vimeo.
The Value of Semantic Discovery in CRM, a lightning talk presented by Anthony Vito, TextWise, at Smart Content: The Content Analytics Conference, October 19, 2010, http://smartcontentconference.com
We’ve updated our Similarity Search plugin to be fully functional with WordPress’ recent 3.0 release! If you have the plugin already installed, you can just go to your Dashboard and update it through the Plugins page. To download it fresh as a new user, visit our plugin’s homepage in the WordPress Plugin Directory.
One thing we’ve been anticipating with WordPress 3.0 is the ability to run multiple sites. This is a functionality previously covered by a separate WordPress product, called WordPress Mu. Multisite is the new functionality that takes over for Mu, and it’s integrated right into the 3.0 release. With just a few tweaks, you can be up and running with multiple blogs on one WordPress installation in no time. The REALLY cool thing is that our plugin is also compatible with the new multisite mode. That means if you administer multiple blogs using a multisite instance of WordPress 3.0, you install our plugin once on the main blog, and it’s available to all of the sub-sites. Pretty neat!
As always, we love to hear your feedback, the good or the bad. Good lets us know we’re on the right track, and bad lets us know when we’re not (or if something’s broken and we didn’t find it). Have fun with the plugin, and happy blogging!
Our API was unavailable for some users from 1:20 to 2:20 EDT today and the issues have now been fixed and access restored. We apologize if this caused any inconvenience.
Please contact us if you have any questions.
SemanticHacker.com and the API will be scheduled for maintenance on Sunday, September 20th from 2am – 4am EST.
During that time, the website and API will be unavailable. We are sorry for any inconvenience.
Please contact development@semantichacker.com if you have any questions.
I had an uncle who would coach my brothers when bringing home those less-than-spectacular report cards. “Just tell your mother and father they changed the scale” and he suggested the following: A=Awful, B=Bad, C=Could be better, D=Dandy and F=Fantastic. Alas the new scale was rejected, and the usual no TV purgatory followed for the boys.
We all have a pretty good intuitive sense of the A-F grading scale, or the happy/sad face pain scale now used in hospitals and doctors’ offices. But there is no intuitive scale in use today to determine whether something returned to us from a search on the web, or suggested as a related item to a web page or document being viewed, is actually ‘what you’re looking for’ or ‘intent’. In my iSchool (Syracuse) we investigated all the variants of ‘what you’re looking for/intent’ including relevant, pertinent, accurate, useful, etc. and of course the recall/precision tradeoff. (Wonder if it’s true 9/10 times business people want precision? That’s what a Gartner analyst claimed at SIGIR this year.) But it still isn’t easy to create a scale for match judgments even when the definition (intent) is tightened up. Pre-web, a binary scale was certainly most popular: Relevant/Not Relevant. This was the TREC scale used for years, and it worked very well for tuning systems along precision/recall lines. But this scale has always been problematic. There are so many gradations to Relevant it is very hard for humans to make a yes/no call on a match, especially when the evaluator is not the person who came up with the query.
When we tested our advertising system at TextWise, we worked long and hard to provide definitions for a 5 point scale: Extremely Relevant, Highly Relevant, Somewhat Relevant, Not Relevant, Embarrassing. Eventually Extremely Relevant was collapsed into Highly Relevant as the distinction simply required too much documentation. An Embarrassing rating changes given the application. For advertising, Embarrassing was the car ad placed on the page containing the article about how the car resembled an accordion in an accident. For other applications, Embarrassing may simply be that there is no discernable reason why the match occurred.
Attending the recent SIGIR09 meeting in Boston this year, another scale was frequently presented for web search: Perfect, Excellent, Good, Fair, Bad. Most any system will show reasonably well with a scale like that. Is this really “Yes! Kind of, Sort of, Maybe, No”? What is Excellent v Good v Fair? These distinctions require lengthy discussions and serious documentation. And still will yield noisy judgments.
Inter-coder reliability, or assessing how much humans actually agree on these human judgment tasks, is frequently measured by Kappa statistics. For match judgments, this is usually not a strong number.
There are even services now to provide these judgments, from Mechanical Turk, Delores Labs, etc. These services use a “casual workforce” so the training/documentation can’t be too burdensome. One page max for guidelines is recommended. This means whatever scale used has to be pretty intuitive. And there is a load of noise generated by low inter-coder reliability, which means pay for lots of judgments to account for the noise.
Interested in hearing from others who have travelled down this road. What are you using?
SIGIR Day Three July 22, 2009
Great job by Daniel Tunkelang of Endeca putting this track together
Morning: Industry Track Speakers
“Webspam and Adversarial IR: the Road Ahead” (Google) Matt Cutts
Requirements for spammers: Content, Reputation, Opportunity for monetization. Examples of on-page and off-page spam provided. Spoke of defensive tools such as nofollow. Clear increase in devising spamming routines to outright hacking. 1) Concentrate on finding hackers - joining with spammers – malware detection key - hack sites and sell links. 2) Prevent common spam – human tests, etc – which techniques prevent it that any site pub can use (spam classification for wordpress blogs – good tool). 3) Looking for trust, identity, authentication. Warning – facebook, twitter, etc new ecosystem new forms of spams, fake profiles abound.
“The Searchable Nature of Acts in Networked Publics” Danah Boyd (Microsoft)
Danah’s research area is social media, she’s looked at differences between myspace and facebook, etc. Her focus is communication – she is an ethnographer: “How young people use the internet” Everything is VISIBLE. Distinction between social network sites and social networking sites. Social network sites: A/ engaging with preexisting friends (diff from social networking sites – meet new people) Profile is the digital body – misinformation is intended and everywhere 1) meant to be funny (alter egos) 2) young people have been told to lie about who/what – keep away the predators, 3) don’t want to searchable/found. Don’t assume there is accurate information in social network sites. Average age stats are wrong! B/ Public articulation of “friends” – assumes links are equal but relationships are not equal. Three key concepts of networks: sociological, articulated (public), behavioral (exchange content/interact). Networked Publics: Issues – Persistence, Replicability (context freq gone), Searchability (not who you want for the most part) Scalability (who is seeing your content) Invisible Audiences – who are you talking to? Leads to imaginary audiences. Collapsed Contexts (social context is constantly changing, freq misleading) New Public/Private Boundaries (getting reworked). Twitter: just this spring a Big player. Twitter is not a chat. It is constantly changing who is using it for what; celebrity cache and mouthpiece to get back at powerful bloggers; soapbox, you choose who you follow. 5-15% accounts are protected – most accts are public. 5% contain a hashtag (almost half of these contain a URL). 22% include a URL. 36% mention another twitter user (put it at the beginning – Tweet is really directed at individual). 50 accts have over 1M followers, 350 have several hundred thousand, millions of accounts are dead. 140 characters – very difficult constraint for searchability; retweets – some attribute, some drop it. Info Retrieval Thoughts: Social media is about conversation and contexts – tough to make sense of the social context. Danah@danah.org
“Ad Retrieval – A New Frontier of Information Retrieval” (Vanja Josifovski – Yahoo! Research)
Disclaimer – can’t expose any Yahoo! trade secrets. 40% ads textual – competing with other content on page - sponsored search and content match placement. ~30% of web users interact with ads (thinks this is because ads are not relevant). Text ads have visible/non-visible parts – landing url too big, too much info. Bid phrase (keywords) used to target ad (ads are creatives + bid phrase). Ad Retrieval: Sponsored search – keyword bid – dbase technology. Content Match –look for bid phrases, place ads – still single feature matching. New way to look at it: Treat the ad as a document in IR. Cost of serving the ad needs to be less than the revenue returned, also need to keep performance in mind.
“Corpus Linguistics and Semantic Technology at the New York Times” (Evan Sandhaus – NYT) (Semantic Technologist, NYT R&D)
NYT annotated corpus – LDC – 20 years of data. 20 years of annotated corpus (launched 10/08) obtain through LCD or nyt – 1987-2007 – 1.8M articles, abstracts, 900K+ tags, 665K abstracts – NITF formal, xml standard corpus.nytimes.com Reuters corpus came first, smaller collection, annotated. Potential uses of data: some ideas.. Location of article is implicit ranking, # of words, etc. (mm: too temporal?); Automated document summarization corpus gold. 80 users after nine months.
“Query Modeling at bing” Nick Croswell, Bing (Microsoft Research Cambridge) (Filling in for sick speaker – OCLC)
Can’t tell how everything works (trade secrets, etc) Ambiguous queries: ‘house’ Mine the logs – click logs; Session data – what query follows the query “house”?; What other queries have click on that URL (co-click data); Intent clustering – use session data string of queries, keep nodes but replace edges with co-click data; then cluster on top of this. Provides measurement poss, improve IR modeling of queries, put into UI dev. Temporal dynamics in logs: spiking/seasonal queries – feed ranking; Periodic queries (DST 2007/DST 2008); Stale anchors / trailing signals (BO -campaign page v white house) temporal query expansion – watch spikes. Table of Contents: Summary of aspects of entity – summarize mainline (results) Sticky control panel. “Bing Gets It” provide info along popularity and consistent content, summary of mainline results. V1 just out, continues development.
Industry Track: Afternoon Panels
Search Industry Analysts: Whit Andrews Gartner Sue Feldman IDC Theresa Regli (CMS Watch) Marti Hearst (Responder) Daniel Tunkelang (Moderator)
TR: ( Implementation consultant for ten years) 3 yrs as an analyst, evaluate products, what fits for your needs? CR of search products – clients have very specific needs. Sounds like she has lots of eDiscovery clients; audio/video clients;
SF: (linguistics background) Web search is not ahead of enterprise search – v interesting stuff in the enterprise search systems. Real time information – Enterprises have immediate info needs, automated online info key; Mobile work force – access to everything in company (security and access); Money – now a necessity to fund access, not a nice to have.
Trends – search based apps to solve a biz problem – borrowed from search architecture; Convergence of platforms – IM, search, etc, Unified access to info – BI tools on all data – flexible Hybrid architectures – dbase with inverted indexes / but dbase features supporting ad hoc querying and search. Search is not a goal in itself, needs to be integrated into the workflow process. UI will sell new tools to new buyers (mktg/mgrs, not it staff). Task/tools, not single search tech.
Open source embedded everywhere – collab, crm, sales, etc. Lucene, Solar, etc. (not free really)
WA: (ex-journalist) 4 trends: Federation (access without paying for it) doesn’t nec always work, but seeing improvement and will see value. Conversation – disambiguation of query – ask the user – participatory search; Transparency – what is driving results ranking. Video – growing like crazy in 2009. Real time is more important than ever. Value. “Relevance is about money”
Lively discussion throughout the session about the relevance=money statement. Big Disagreement.
QA Session – Could not hear some floor speakers, selected questions only her
?Employees want a search box – no one wants sophistication? No defense against the search box and search button
A: People are not married to Google results if you show effective use of different system.
? How do you evaluate an enterprise search system?
A: depends on the need: recall sometimes, precision sometimes A: precision. Business goals are what matter -any evaluation needs to be within business goal and most companies don’t have them coming to the table.
Marti: Spent a lot of time in this panel talking about integrity, unusual for this conference as we strive to be honest and direct in our work. How do you control when your competitor says “this is the cutting edge” how do you not go on the bandwagon. SF: Don’t read competitor reports. If buyers, first bring your requirements. WA: we have a hype cycle. Tech trigger to hype, to expected expectation, trough of no delivery, then the reality. Where it is on the hype cycle? where is it on the adoption curve? What hype are you willing to tolerate – where does your business fit on this curve?
Theresa: Tamping down the hype – be skeptical.
(Several other questions here but responses were wide ranging so I have left out)
Industry Track: Vendor Panel
Jeff Fried Sr Prod Mgr MSFT(M); Rual Valdes-Perez Co-Founder Vivisimo (V); Adam Ferrari CTO Endeca(E)
Liz (Moderator) Bruce Croft (Responder)
Liz: What areas need work/advances?
Endeca: Evaluation for interactive IR? What features to include? Efficiency of architectures
Search to find vs search to learn – > interactive search
Microsoft: 3 views – Biz Analyst, End User, Systems Guy. Where researchers and practitioners align:
Take users intent, match it to content. Most people are unhappy with enterprise search. Context matters, diff systems for diff companies; Positive feedback: what’s working; User Experience Measure: search.ui. matters.
Vivisimo: The big opportunity in enterprise search – web search history. Enterprise – lots of new companies and UIs
Single search box access to everything is the big opportunity in enterprise search.
Liz: What is the first problem we should work on? E: Test the efficacy of interactive IR. M: Holistic eval is needed. Better theory about interaction. V: Search to Problem solving systems.
Liz: What’s unique about determining relevance in your system? M: Provide controls, slide bar for precision/recall control, exploration. Use user logs to improve relevance. V:Tunability, and something else, missed it. E: need relevance, facets, may care about them differently – customizability.
Liz: What would be most appreciated by users of websearch from Enterprise search. ? V: Def UI. Don’t need to worry about ad real estate on the enterprise screen. E: People will enjoy better faceting, etc. but slow. Advance features, visualization, etc. really is appreciated. M: UI, faceting, exploration. People will use these things if they work.
L: What evaluation measures does Enterprise search use/need? E: business metrics will dominate. Look at logs. What’s working, what isn’t. M: Task completion and happy users. V: we see people use # searches before/after adoption. What is the average position of the doc people click on?
L:What is the fundamental technical problem you are currently focused on? M: Systems scale footprint needs. Grow as much as you need it to. V: getting search to work across vast array of content types. E: Greater effectiveness with greater simplicity. M: Search connectors are always tough, would love to have research on it.
Where could academics and vendors work together? V: Companies only want people, not their research. People transfer would be the best project. Fund university – get grad students. E: Events like today. Cross pollinate. Formalize ways of opening dialogue. More openness on part of vendors for transparency. M: LDC, SIGIR Industry Track, University sponsorship needs to be easier, cheaper.
Panel had an opportunity to ask each other questions:
E? How do you reconcile single search box with enterprise search needing deep interaction with repositories of data?
High value deep problems won’t work with this… V: Do believe this is an opportunity. Not a research problem. It’s an opportunity. Facets are needed for every diff kind of data. Provide both.
? Where does Endeca see the role of federation? Coexist but infrastructure needs to be there to accommodate deep search needs.
V?: Sharepoint is wonderful. I’ve seen the videos. Can a platform serve the entire a market? Underserve/overserve markets. High end medical records mgmt might be underserved. (Missed audience QA and Responder)
SIGIR 09 Highlights July 21, 2009 Jumped around to attend sessions across parallel tracks today.
Information Extraction: “Named Entity Recognition in Query” (Microsoft Research & Institute of Computing Technology CAS) Queries are very challenging for named entity extraction: few words, poorly formed, ambiguous (harry potter review – movie? book?). Research used triples from queries and used millions of queries as training data. Method outperforms the baseline model and shows promise in experimental results.
Web Retrieval 1: “Using Anchor Texts with their Hyperlink Structures” (Microsoft Research & University of Montreal) Use of anchor text Works best with navigational queries (navig. query has only one satisfactory result – need to nail the one right page/site). Previous work gave each link equal and independent status; New models: web site counts for only one vote – sites are independent. Relationships of links within sites and between sites are in the model. Scale (Perfect, Excellent, Good, Fair, Bad) Combined body and anchor text performed best. Anchor text can be improved over current models and site relationships most promising for navigational searching. Future investigations will explore other anchor text relationships.
Interactive Search: “Predicting User Interests from Contextual Information” (Microsoft Research) Using what we know about you to predict future best retrieval. User interest models, personalization, IF, etc. ; little is actually known about the value of different context sources. Cited Ingwerson and Jarvelin – nested model of contextual stratify representing main contextual influences of people engaged in information behavior. Study showed different context sources should not be treated equally. Depends if you are looking at the next hour, day or week in user’s schedule what source is most important.
“ A Comparison of Query and Term Suggestion Features for Interactive Searching” (UNC)
Study set out to help users stumped by system – need help to form query to find information. Used several variations of system generated terms, queries, and user generated terms, queries. Nice quantitative and qualitative analysis. Findings suggest next study should be hybrid system that lets users change terms in suggested queries.
“An Aspectual Interface for Supporting Complex Search Tasks” (Univ of Glasgow) Complex Task as defined by Campbell (1988). Used BOSS search engine, designed aspectual search interface designed to support subtasks. 3 Research questions: 1) Does aspectual interface help user discover information? 2) Does the aspectual interface help the user understand the task? 3) What features are used to carry out task? Best results were in broad tasks. Users worked the entire twenty minute limit with aspectual interface, no saw with baseline interface.
Keynote Day Two: “From Networks to Human Behavior” Albert-Laszio Barabasi Center for Complex Network Research Northeastern University. Fascinating presentation on analysis of networks. 90 minute presentation covering a complex topic so not lending itself to blog summary. Excellent speaker and a great end to day two.