Posts Tagged ‘google’

Hijacked

26 May 2010

Has this ever happened to you? You are Googling for information on the Web, but inadvertently your query happens to share keywords with the latest cultural phenom: the next tweener heart throb, a YouTube video suddenly gone viral, or yet another paranoid political fantasy that refuses to die.

You are a professional, however, and so switch into Advanced Mode to reshape your query, but to no avail. Your information has been buried under pop detritus; it has been hijacked by the maximum likelihood estimate (MLE) on the Web.

At times like this, you want to grab your search engine by the neck and shout, “I am NOT a screaming twelve-year-old girl into dancing cats and fixated on the President’s birth place!” But your search engine continues blithely in the wisdom of the crowd.

It is a reminder that statistically grounded information systems are at the mercy of their training data. If we cede too much control of a system to its finely wrought black box judgment, then we sometimes are going to run off the tracks. This is especially true with web semantics.

If we do in fact want to get under the hood to adjust a semantic system to go against the popular flow, then it helps tremendously if the categories underlying the representation of document content are intelligible to people. Such transparency is a prime motivation for how semantic dictionaries are currently built by TextWise.

Of course, if you care nary a lick about transparency, then may I interest you in this slightly used synthetic collateralized debt obligation….

Search engines work remarkably well when one is searching for a popular topic. Just try the query LOVATO. If you are of the demographic normally reading this blog, then you probably don’t know yet who she is, but Google or Bing will find her. Although she is still obscure enough so that Lovato Electric, Inc., beats her out for top spot on Bing, there is no problem in getting the goods on this latest Disney ‘tween idol.

Here is a different, more frustrating search story, however. I was over at the National Gallery in Washington on Sunday and saw a remarkable series of Renaissance Italian frescos. At home afterwards, I queried on ITALIAN VILLA FRESCO NATIONAL GALLERY WASHINGTON, but found nothing recognizable on Google with either web or image search. About an hour later, I gave up after trying numerous variations of queries.

Then I went to www.nga.gov and navigated down to its 16th Century Italian art page. It offered a virtual tour of a series of frescos by Bernardino Luini on the legend of Procris and Cephalus. Bingo! According to the web site, “These nine paintings are the only examples of an Italian Renaissance fresco series in America.” Strangely enough, I had actually tried the term LUINI in one of my unsuccessful queries.

So we obviously have a failure to communicate here; and this is really a problem that semantic search should be addressing. The relevant page was out there and my queries should have been specific enough, but somehow a beautiful young bride being run through and killed by a magic javelin just wasn’t as sexy as Britney 4.0.