Watson, IBM’s Jeopardy computer, is showing everyone that its 900-pound gorilla of trivia and is likely to beat its human opponents. Watson could still do something stupid, but its formidable performance says much about the effectiveness of current natural language processing technology and computation resources.
Although Watson has a knowledge base of millions of documents gleaned from the Web, its weakness is that it really does not understand any of this data. It is just an extremely smart entity extraction system; Watson uses the terms of a Jeopardy clue as a selecting a particular entity as an answer, which of course then has to be phrased as a question. It has to figure what kind of entity to look for and what kind of context that entity would be found in.
In a sense, this is a simple kind of semantic search because it involves scanning its entire knowledge base of documents and scoring contexts statistically. The entities of the right kind in the highest-scoring contexts are then the prime candidates for an answer; and Watson can use their statistics to derive a level of confidence that a given candidate is the right answer. This basically relies heavily on brute computational power.
As can be seen in the Jeopardy competition, brute power can be quite effective. In most of the straightforward questions that one might expect that Google would do well on, Watson can simply outsearch its opponents. It can grab enough right answers in this way to make up for its frequent wrong answers on more subtle questions requiring a deeper understanding. This is as much gamesmanship as it is intelligence.
Now imagine how overwhelming Watson could be if it actually developed some understanding and made far fewer wrong answers. The first step in this direction is in fact quite easy: develop a large set of semantic categories corresponding to how humans understand language. Indexing a knowledge base by such predefined categories would have the immediate effect of simplifying the search process so that documents do not always have to be analyzed at the lowest linguistic level. That should allow the searches to be broader, much like allowing a chess computer to analyze more moves ahead.
We of course are in the business of semantic dictionaries, which provide a quick way of assigning semantic categories to text documents. Hey, Watson. If you are listening, give us a call.
Similar Blog & News Articles: Powered by 
- 'Jeopardy!' vs. Computer: How IBM's Watson Works :: LiveScience.com
- On 'Jeopardy!' It's Man Vs. This Machine :: NPR Topics: News
- IBM's Watson obliterates humans in first Jeopardy round :: VentureBeat
- Jeopardy: IBM's Watson almost sneaks wrong answer by Trebek :: Ars Technica