Automatically generating related queries in Japanese

被引:2
作者
Jones, Rosie
Bartz, Kevin
Subasic, Pero
Rey, Benjamin
机构
[1] Yahoo Inc, Sunnyvale, CA 94089 USA
[2] Harvard Univ, Ctr Sci, Cambridge, MA 02138 USA
[3] Advertising Com AOL Time Warner, Mountain View, CA 94041 USA
关键词
Kanji in web search; Japanese web search queries; query processing; query substitution; query reformulation;
D O I
10.1007/s10579-007-9021-0
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Web searchers reformulate their queries, as they adapt to search engine behavior, learn more about a topic, or simply correct typing errors. Automatic query rewriting can help user web search, by augmenting a user's query, or replacing the query with one likely to retrieve better results. One example of query-rewriting is spell-correction. We may also be interested in changing words to synonyms or other related terms. For Japanese, the opportunities for improving results are greater than for languages with a single character set, since documents may be written in multiple character sets, and a user may express the same meaning using different character sets. We give a description of the characteristics of Japanese search query logs and manual query reformulations carried out by Japanese web searchers. We use characteristics of Japanese query reformulations to extend previous work on automatic query rewriting in English, taking into account the Japanese writing system. We introduce several new features for building models resulting from this difference and discuss their impact on automatic query rewriting. We also examine enhancements in the form of rules which block conversion between some character sets, to address Japanese homophones. The precision/recall curves show significant improvement with the new feature set and blocking rules, and are often better than the English counterpart.
引用
收藏
页码:219 / 232
页数:14
相关论文
共 14 条
[1]  
*AM NAT STAND I, 1972, AM NAT STAND SYST RO
[2]  
*BAS KNOWL CTR, 2006, BAS TECHN
[3]  
CHIKAMATSU N, 2006, DEV JAPANESE LOGOGRA
[4]  
Jones R., 2003, P 26 ANN INT ACM SIG, P435
[5]  
JONES R, 2006, ED UK WWW2006
[6]  
KAPUR S, 2006, SIGIR 2006
[7]  
MAKINO H, 1980, COLING80, P295
[8]  
Manning C., 1999, Foundations of Statistical Natural Language Processing
[9]  
NAGATA M, 2000, P ACL, P384
[10]  
RUTHVEN I, 2003, SIGIR 2003