Archive for September, 2009

The TextWise Semantic Signatures® technology provides relevant data—matches, tags, etc—for textual content. Measuring the relevance quality of our technology has presented us with some challenges: What data do we use to test relevance? Who judges the relevance quality? On what scale do they judge relevance? How do we provide those judges with meaningful instructions for making a qualitative judgment?

We quickly determined that we needed to use unbiased, external judges to test our relevance quality and, in order to measure change over time, we needed a static set of test data. External judges were selected from candidates who demonstrated an ability to read attentively but no particular expertise in semantics was expected. The test data was collected to represent the variety of types of data that might be encountered by someone using our API. Since we perform a variety of relevance tests, it is important to choose a scale for judging which fits the purpose of the test. In a recent matching test, for example, we used a four point scale which allowed some distinction among degrees of relevance without being overwhelming.

The instructions for the judges were more challenging. We knew that the guidelines we gave to the external judges would have to be clear and concise. The judges would need explanations for each degree of relevance. Those explanations would have to be supported by examples from which the judges could generalize to the variety of cases they might see in the data. At the same time, the guidelines had to be sufficiently brief that judges could easily refer to them during the judging task and quickly refresh themselves on the guidelines when doing assignments that might come months apart.

Most importantly, we first needed an internal consensus on how to define our relevance scale. Getting this internal consensus required multiple rounds of reviewing drafts of the guidelines, performing judgments, and discussing our differences. Participants in this process were drawn from the science, quality assurance, and product management teams to ensure that this was a company-wide initiative. After each round of judging, we calculated the degree of agreement using a kappa coefficient, a standard measure of reliability among multiple judges.

Once we reached internal agreement, the external judges needed to be trained using the guidelines. Our test data included a subset specifically reserved for training and our final internal judgments were retained as an answer key for that data set. Again, agreement among the judges was measured using kappa. Once the judges are trained, we continue to monitor performance using a small set of data for which all judges submit judgments. When the kappa measure shows a drift among the judges, we do a retraining exercise.

After the external judges are trained, they can perform judgments on the larger data set. Relevance judgments are done on major releases and on minor releases that include changes to any components that could impact relevance. Judgments are retained in a database from one relevance test to the next so that any given judgment only needs to be performed once. When a test is completed, we use multiple statistical measures to analyze the outcome.

One of the challenges with creating and maintaining applications for the web is keeping up with all of today’s different web browsers and their differing under-the-hood technologies and functionality.  New versions of browsers and operating systems are released frequently for a number of reasons, such as feature enhancements to security fixes.  There is a wide variety of web browsers available today, each offering something a bit different from the others.  Operating system vendors have their own, some of them are cross-platform and work on other operating systems, then there are the third-party browsers, and we haven’t even explored the mobile browser realm yet…  Creating and maintaining a set of browser and OS combinations as a company standard toward which applications can be developed and tested has become key for us.

Our standard has been created using statistics on browser and OS usage from W3Schools, broken down by brand and version.  By collecting this data and observing trends over time, we can decide when it’s appropriate to either start or discontinue supporting a browser, OS, or combination of the two.  Our process is to evaluate our browser/OS support matrix each time a new major or minor version of a browser or OS is released, or at most every 6 months (assuming no browser or OS updates have occurred).  Doing an evaluation of the statistics is important even if no updates have occurred, because some browsers may fall below a percentage of use needed for support, or others may have increased enough in usage or popularity to now be supported.

It’s also important to be able to test those combinations to ensure compatibility.  Rather than bearing the expense of having every possible combination in-house, we use a service on the web that specializes in providing those tools to help us test.  The service that we use is called BrowserCam, which gives us the ability to take “snapshots” of our applications in various browser/OS combinations on the web, and remote access on those machines for interactive testing.  And to answer the original question, we have no idea – PlanetWeb2.6 on Dreamcast is not one of our supported combinations.

SemanticHacker.com and the API will be scheduled for maintenance on Sunday, September 20th from 2am – 4am EST.

During that time, the website and API will be unavailable. We are sorry for any inconvenience.

Please contact development@semantichacker.com if you have any questions.

Casino Royale

14 Sep 2009

In any statistical information system, one can never achieve absolute certainty. Every result is a kind of bet with the possibility of losing. For Semantic Signatures, however, this is more like playing blackjack than like playing roulette. Whether we imagine ourselves as the house or some hotshot card counter, we try our utmost to bend the odds in our favor.

When a given term occurs in a document, we know that there is a certain probability that the document is about a given topic. For example, THRILLER may relate to Michael Jackson or to some recent summer popcorn epic. Similarly MOONWALK may refer to Apollo XI or to a dance move. We would be rash to judge content just on the basis of a single term, but when multiple terms can corroborate each other, we do have a better bet.

The trick here is to able to set up a semantic dictionary so that we can always expect to find a reasonable number of terms in a target document that allow us to make that better bet. This requires careful balancing: we need enough semantic dimensions to be able to distinguish the different important kinds of content and enough terms for each dimension to put it into play. It is much like developing a diverse portfolio of investments to weather any shift in economic climate.

Most people will probably pass on building their own semantic dictionaries. It takes a tremendous amount of work to collect and filter the requisite text data to ground our dictionary weights and to massage all those numbers to get the maximum amount of usable information. But we want to get on the right side of the odds.

I use Google Reader religiously. Google Reader is one of my first web destinations in the morning and one of the last at the end of the day. I skim titles looking for clues as to what will be of interest to me (I only view items in the ‘list’ format, I’d be dead and buried by the time I got to the end if I viewed in the ‘expanded’ format). My Reader account keeps me informed, in-the-know, and on top of the latest bit of intelligence I can hope to find. I replaced my once beloved Bloglines for this service.

Now Google Reader has become my biggest nemesis, a time-waster if you will, the bigger it gets. Loosely organized and, because it’s based on RSS feeds, not real-time by any stretch of the imagination (our own blog posts here on SemanticHacker.com take hours to show up) I can spend literally hours perusing the information tsunami that happens on a daily basis. Like most of the population I have many interests from keeping up with the latest social media marketing craze to what’s happening in the world of semantic web applications to finding fun toys at discount prices I can buy for my kids.

What are Google Reader’s Shortcomings?

1.      Organizing feeds is a manual process.
Every time I subscribe to a new RSS feed, I need to manually place it into a folder. Many times the feed I subscribe too crosses the line of the topics it covers (TechCrunch is a prime example of this).

2.      The ‘starring’ option is an unusable feature.
Relevant to the point above, unless Google can automatically organize my starred items, this is as pointless as starring something in Gmail. Likewise for items I’ve shared and those the people I follow have shared.

3.      Search is nice, but of course, keyword-based.
Steve Rubel thinks it’s a good personal database search using Google, but let’s face it – it doesn’t solve the overwhelming-amount-of-information-problem. If I search for ‘Facebook acquisition,’ there is no context in which it searches. Essentially I’m still forced to filter through (in my case) 740 items.

How Google Reader Can Be Better

1.      Make it real-time.
Go beyond RSS. Allow me to add my Twitter accounts (I manage more than 1) and Facebook account. Maybe Google Caffeine gets us closer.

2.      Automatically organize my feeds.
Don’t allow me to create folders and force a feed into a single topic. Sure it would be smart to allow the user to rename the folder, but the immediate organization of the feed shouldn’t be so daunting or simplistic – “marketing” is too broad and I don’t have the time or patience to narrow these down further.

3.      Fix search.
A loaded statement for sure, but show me related items in my feeds just by using a specific article as the basis for a search. Blatant plug here, but if everything was indexed and tagged with a TextWise Semantic Signature, this feature would be a no-brainer. Why rely on keyword matching when I already have an idea of what more I want to see?

Recently TechCrunch had their own idea of what else needs fixing with the newer “like” feature and I fully concur so I don’t think I need to rehash that.These are just a few of my wish list items for Google Reader and they are making an effort to add functionality, so perhaps one day I will see something like one of these I mentioned above come to fruition.