SIGIR 09 Day One http://sigir2009.org/Program Several parallel tracks at SIGIR – here are some highlights from sessions I attended today.
Susan Dumais gave the opening keynote @ SIGIR 09 “An Interdisciplinary Perspective on Information Retrieval” Dumais was the 2009 recipient of the Gerard Salton Award for her contribution to the Information Retrieval field. Her work at Bell Labs/Bellcore exploring vocabulary mismatch (aka verbal disagreement) led to her LSI work. She has worked at Microsoft Research since 1997 and currently leads the Context, Learning, and User Experiences in Search team. Her talk spoke about her background (cognitive psychology/mathematics) and how the problems of information retrieval and the huge social and technical leaps in the fields in the last fifteen years have made it a very exciting time to be working in this area. However, as much as things have changed, much has stayed the same. Haven’t escaped the search box, or the results list. Observed searching habits: high frequency in which we repeat our searching – “re-finding” on the desktop and the web. Date is the most common sort selected when changing from the default option. She called for more personalized search research – we need models to support personalized search: when to use it, when not to (works only some of the time). Evaluation continues to be challenging. Behavioral data is extremely noisy – especially click data. For future research: IR solutions must acknowledge dynamic information environment and experiments and data must reflect this environment. Need data that mirrors the dynamic information environment; she called for a ‘Living Laboratory’ made up of logs of search engine, searching resources such as Wikipedia, etc. Needs a group to mobilize to put this resource together; plugged the Lemur Query Toolbar. IR research needs and interdisciplinary team to understand users and thinking outside the box to meet the challenges ahead in IR.
Novel Search Features Session Notes: “Web Searching for Daily Living” (NTT Comm): collecting information about every day actions from cameras and incorporating the information into websearch queries using clustering techniques to return useful information. Forward looking research as few of us have web browsing tools on our appliances or in our bathrooms but paving the way. This is what they mean by the phrase search ubiquity! “Global Ranking by Exploiting User Clicks” (Yahoo!): Collecting information about user click sequences and then through supervised learning provide prediction. Must look across results, not within single documents after click. Position influences clicks – first result often clicked on. Aggregation of data is key – click data is very noisy. “Good Abandonment in Mobile and PC Internet Search” (Google) Investigation of when search abandonment is good (answer is right in results list – no need to open page) much more likely to occur on mobile device as opposed to PC; varies by locale (looked at US, Japan, China) and by category of query. Research to estimate rates and get first study designed: classification by modality, locale, category.
Web 2.0 Session Notes: “A Statistical Comparison of Tag and Query Logs” (Strathclyde & Lugano Universities) Very cool zooming slide ware used in presentation. Found more vocabulary shared between queries and tags than any combination of queries, tags, and content of search results. Data set used: AOL query logs, Delicious tags, ODP categories. “Enhancing Cluster Labeling Using Wikipedia” (IBM Research) Found very promising results using Wikipedia metadata to label clusters. Walked through approach, evaluation. Findings suggest continued development of this work would provide better quality labeling of clusters.
Question Answering Session Notes:
“A Classification-based Approach to QA in Discussion Boards” (Lehigh University) How to ask questions on the web – Options: Search Engines, QA portals, Discussion boards. This research focused on detecting Questions and Answers on Discussion Boards. Discussed techniques found to work best for Questions and for Answers. “Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning” (Microsoft Research & Huazhong Science and Tech University) Presenter said search engines must deliver answers sooner than later. Mining data from community forums (Yahoo! Answers Archive) to find clues for linkages among question and answers. Model the previous knowledge. Each question had 16 answers on average in data set. Very promising results.