Deriving Query Intents from Web Search Engine Queries

Lewandowski, Dirk and Drechsler, Jessica and von Mach, Sonja Deriving Query Intents from Web Search Engine Queries., 2012 [Preprint]

[img]
Preview
PDF
JASIST_Query_Intents_Preprint.pdf

Download (2MB) | Preview

English abstract

The purpose of this paper is to test the reliability of query intents derived from queries, eitherby the user who entered the query or by another juror. We report the findings of three studies:First, we conducted a large-scale classification study (approximately 50,000 queries) using acrowdsourcing approach. Then, we used click-through data from a search engine log andvalidated the judgments given by the jurors from the crowdsourcing study. Finally, weconducted an online survey on a commercial search engine’s portal. Since we used the samequeries for all three studies, we were able to compare the results and the effectiveness of thedifferent approaches, as well. We found that neither the crowdsourcing approach using jurorswho classified queries originating from other users, nor the questionnaire approach usingsearchers who were asked about their own query that they just entered into a web searchengine, lead to satisfying results. This leads us to conclude that there is little understanding ofthe classification tasks, even though both groups of jurors were given detailed instructions.While we used manual classification, our research has important implications forautomatic classification, as well. We must question the success of approaches usingautomatic classification and comparing its performance to a baseline from human jurors.

Item type: Preprint
Keywords: search engines, information needs, query classification, user intent, web queries, web searching
Subjects: L. Information technology and library technology
L. Information technology and library technology > LS. Search engines.
Depositing user: Dirk Lewandowski
Date deposited: 30 Jun 2012
Last modified: 02 Oct 2014 12:22
URI: http://hdl.handle.net/10760/17245

References

Ashkan, A., & Clarke, C. L. A. (2009). Term-based commercial intent analysis. Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (p. 800–801). New York: ACM.

Baeza-Yates, Ricardo, Calderón-Benavides, L., & González-Caro, C. (2006). The intention behind web queries. In F. Crestani & M. Sanderson (Eds.), String processing and information retrieval (Vol. 4209, pp. 98-109). Heidelberg: Springer.

Bar-Ilan, J., Keenoy, K., Yaari, E., & Levene, M. (2007). User rankings of search engine results. Journal of the American Society for Information Science and Technology, 58(9), 1254-1266.

Broder, A. (2002). A taxonomy of web search. ACM Sigir forum, 36(2), 3-10.

Calderon-Benavides, L., Gonzalez-Caro, C., & Baeza-Yates, Ricardo. (2010). Towards a deeper understanding of the userʼs query intent. In SIGIR 2010 Workshop on Query Representation and Understanding (pp. 21-24). New York: ACM.

Chao, L., Guo, F., & Wand, Y.-M. (2009). Efficient multiple-­‐click models in web search. Proceedings of the Second International Conference on Web Search and Web Data Mining (pp. 124-131). New York: ACM.

Church, Karen, & Smyth, B. (2009). Understanding the intent behind mobile information needs. In 13th International Conference on Intelligent User Interfaces (pp. 247-256). New York: ACM.

ComScore. (2010). comScore reports global search market growth of 46 percent in 2009. Retrieved from http://comscore.com/Press_Events/Press_Releases/2010/1/Global_Search_Market_Grows_46_Percent_in_2009.

Craswell, N., Zoeter, O., Taylor, M., & Ramsey, B. (2008). An experimental comparison of click position-bias models. In Proceedings of the International Conference on Web Search and Web Data Mining (pp. 87-94).

Croft, B., Bendersky, M., Li, H., & Xu, G. (Eds.). (2010). Query representation and understanding: Workshop of the 33rd Annual International ACM SIGIR Conference on research and development in information retrieval. Retrieved from http://ciir.cs.umass.edu/sigir2010/qru/QRU_proceedings.pdf

Dai, H. K., Zhao, L., Nie, Z., & Wen, J. R. (2006). Detecting online commercial intention (oci). In Proceedings of the 15th International Conference on World Wide Web (p. 829–837). New York: ACM.

Dou, Z., Song, R., Yuan, X., & Wen, J.-R. (2008). Are click-through data adequate for learning web search rankings? In Proceeding of the 17th ACM conference on Information and knowledge management (p. 73-82). New York: ACM.

Frants, V. I., Shapiro, J., & Voiskunskii, V. G. (1997). Automated information retrieval: theory and methods. Library and information science. San Diego: Academic Press.

Griesbaum, J. (2004). Evaluation of three German search engines: Altavista.de, Google.de and Lycos.de. Information Research, 9(4).

Guo, F., Liu, C., & Wang, Y. M. (2009). Efficient multiple-click models in web search. In Proceedings of the Second ACM International Conference on Web Search and Data Mining (p. 124–131). New York: ACM.

He, D., Göker, A., & Harper, D. J. (2002). Combining evidence for automatic web session identification. Information Processing & Management, 38(5), 727-742.

Huffman, S. B., & Hochster, M. (2007). How well does result relevance predict session satisfaction? In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (p. 567–574). New York: ACM.

Höchstötter, N., & Koch, M. (2009). Standard parameters for searching behaviour in search engines and their empirical evaluation. Journal of Information Science, 35(1), 45.

Höchstötter, N., & Lewandowski, D. (2009). What users see—Structures in search engine results pages. Information Sciences, 179(12), 1796-1812.

Jansen, B J, Booth, D. L., & Spink, A. (2008). Determining the informational, navigational, and transactional intent of Web queries. Information Processing and Management, 44(3), 1251-1266.

Jansen, B. J., Spink, A., Blakely, C., & Koshman, S. (2007). Defining a session on Web search engines. Journal of the American Society for Information Science and Technology, 58(6), 862–871.

Joachims, T. (2002). Optimizing search engines using click-through data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (p. 133–142). New York: ACM.

Kang, I. H. (2005). Transactional query identification in web search. In: Information Retrieval Technology, LNCS 3689, (pp. 221–232). Heidelberg: Springer.

Kang, I. H., & Kim, G. C. (2003). Query type classification for web document retrieval. In Proceedings of the 26th Annual international ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 64-71). New York: ACM.

Kantor, P. B. (1976). Availability analysis. Journal of the American Society for Information Science, 27(5-6), 311-319.

Kanoulas, E., Clough, P., Carterette, B., & Sanderson, M. (2010). Session track at TREC 2010. SIGIR Workshop on the Simulation of Interaction (p. n.a.). Retrieved from http://kanoulas.staff.shef.ac.uk/research/mypapers/sigir10e.pdf

Kathuria, A., Jansen, Bernard J., Hafernik, C., & Spink, Amanda. (2010). Classifying the user intent of web queries using k-means clustering. Internet Research, 20(5), 563-581.

Lee, U., Liu, Z., & Cho, J. (2005). Automatic identification of user goals in web search. In Proceedings of the 14th international conference on World Wide Web (p. 391–400). New York: ACM.

Lewandowski, D. (2006). Query types and search topics of German Web search engine users. Information Services & Use, 26, 261-269.

Lewandowski, D. (2008). The retrieval effectiveness of web search engines: Considering results descriptions. Journal of Documentation, 64(6), 915-937.

Lewandowski, D. (2011). The retrieval effectiveness of search engines on navigational queries. ASLIB Proceedings, 61(4), 354-363.

Li, H.; Xu, G.; Croft, B. Bendersky, M. (eds.): Proceedings of the Query Representation and Understanding Workshop held at SIGIR 2011. Retrieved from: http://ciir.cs.umass.edu/sigir2011/qru/proceedings-qru2011.pdf

Lu, Y., Peng, F., Li, X., & Ahmed, N. (2006). Coupling feature selection and machine learning methods for navigational query identification. In International Conference on Information and Knowledge Management, Proceedings (pp. 682-689).

Macdonald, C., & Ounis, I. E. T.-F. (2009). Usefulness of Quality-through Data for Training. In Proceedings of the 2009 Workshop on Web Search Click Data (pp. 75-79). New York: ACM.

Marchionini, G. (2006). Exploratory search: From finding to understanding. Communications of the ACM, 49(4), 41–46.

Mendoza, M., & Baeza-Yates, R. (2008). A web search analysis considering the intention behind queries. In Proceedings of the Latin America Web Conference (pp. 66-74).

Mendoza, M., & Zamora, J. (2009). Identifying the intent of a user query using support vector machines. In String Processing and Information Retrieval (Vol. 5721, p. 131-142). Heidelberg: Springer.

Pitler, E., & Church, Ken. (2009). Using word-sense disambiguation methods to classify web queries by intent. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 (p. 1428-1436). Association for Computational Linguistics.

Rose, D. E., & Levinson, D. (2004). Understanding user goals in web search. In Proceedings of the 13th International Conference on World Wide Web (p. 13–19). New York: ACM.

Singer, G., Norbisrath, U., Vainikko, E., Kikkas, H., & Lewandowski, D. (2011). Search-Logger -- Tool Support for Exploratory Search Task Studies. SAC2011 (pp. 751-756). New York: ACM.

Spink, A. (2004). Web search: public searching on the Web. Dordrecht: Kluwer Academic Publishers.

Spink, A., Wolfram, D., Jansen, B.J., & Saracevic, T. (2001). Searching the web: The public and their queries. Journal of the American Society for Information Science and Technology, 52(3), 226–234.

Véronis, J. (2006). A comparative study of six search engines. Retrieved March 18, 2011, from http://sites.univ-provence.fr/veronis/pdf/2006-comparative-study.pdf.

White, R. W., Bailey, P., & Chen, L. (2009). Predicting user interests from contextual information. In 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 363-370). New York: ACM.


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item