What can you do with Semantic Signatures?

posted by Bill Sherwood on 08/26/09

I was asked that question quite a few times when I was at the KM World and SemTech conferences. The answer is simple: use a Semantic Signature as a query against an index of Semantic Signatures to find the most relevant content.

In order to illustrate what a Semantic Signature is, we provide the example of a document with 30 semantic dimensions labeled using the Open Directory Project taxonomy (www.dmoz.org). The example lead people to believe that a Semantic Signature is nothing more a multivariate categorizer for content navigation, categorization, or other forms of content bucketing. While Signatures can certainly be used for that, it is not how we use them at TextWise.

If you examine a Semantic Signature without reading the thirty labels, you’ll observe it is a 30 dimension vector, of concepts and weights. These concepts and weights are used by TextWise in a simple vector math calculation to determine the similarity between two signatures. Once a score is obtained, it is normalized to an integer value and then a cutoff is chosen to determine if each signature is relevant to the query.

For a real world application, a user controlled sliding scale from 1 -10 can be used within the calculation to control what content items, represented by the Semantic Signatures, are displayed: a score of 9 would instruct the application to show only the highly relevant content while a score of 2 would show a greater recall of content.

Why would I use Semantic Signatures to search for content?  If you have read an article on the web or on your companies’ intranet and attempted to find additional content related to what you’re looking at, you know it is a cumbersome process:  Identify keywords to use from the source, use them to search, review the results, repeat the process until you either found what you are looking for or capitulated in your effort.  If you performed the same search against an index of Semantic Signatures, you simply use the document as the query, eliminating the inherent keyword/guesswork/review cycle with using today’s keyword systems.

From a developer’s perspective, the major benefits of Semantic Signatures are:

  • They are a very accurate and compact representation of a document – each Signature only consumes ~180 bytes of RAM.
  • Computing similarity of Signatures is a very light weight vector calculation and unlike keyword matching, there is no need for patterns, alias tables, synonym tables, spell correction, etc.
  • Scalability. 3 million Signatures will easily fit within a 1.5 GB 32 bit Java VM and result in full index searches taking ~ 70 milliseconds.

If you want to learn more about Semantic Signature technology and use our free API to create Semantic Signatures for your content, visit TextWise.

Comments are closed.