Investigating the Performance of Automatic New Topic Identification Across Multiple Datasets

Özmutlu, H. Cenk and Cavdur, Fatih and Spink, Amanda and Özmutlu, Seda Investigating the Performance of Automatic New Topic Identification Across Multiple Datasets., 2006 . In 69th Annual Meeting of the American Society for Information Science and Technology (ASIST), Austin (US), 3-8 November 2006. [Conference paper]


Download (256kB) | Preview

English abstract

Recent studies on automatic new topic identification in Web search engine user sessions demonstrated that neural networks are successful in automatic new topic identification. However most of this work applied their new topic identification algorithms on data logs from a single search engine. In this study, we investigate whether the application of neural networks for automatic new topic identification are more successful on some search engines than others. Sample data logs from the Norwegian search engine FAST (currently owned by Overture) and Excite are used in this study. Findings of this study suggest that query logs with more topic shifts tend to provide more successful results on shift-based performance measures, whereas logs with more topic continuations tend to provide better results on continuation-based performance measures.

Item type: Conference paper
Keywords: search engines ; user behavior ; topic identification ; topic differentiation ; subject identification ; subject differentiation
Subjects: I. Information treatment for information services > IB. Content analysis (A and I, class.)
I. Information treatment for information services > IC. Index languages, processes and schemes.
H. Information sources, supports, channels. > HQ. Web pages.
L. Information technology and library technology > LS. Search engines.
Depositing user: Norm Medeiros
Date deposited: 16 Dec 2006
Last modified: 02 Oct 2014 12:05


Beeferman, D. & Berger, A. (2000) Agglomerative clustering of a search engine query log Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA 407 -416

Beitzel, S.M., Jensen, E.C., Chowdhury, A., Grossman, D. & Frieder, O. (2004) Efficiency and Scaling: Hourly Analysis of a Very Large Topically Categorized Web Query Log Proc. of the 27th Inter. Conf. on Research and Development in Information Retrieval, Sheffield, UK 321-328

Cooley, R., Mobasher, B., & Srivastava, J. (1999) Data preparation for mining world wide web browsing patterns Knowledge and Information Systems 1, 5-32

He, D., Goker, A. & Harper, D.J. (2002) Combining evidence for automatic Web session identification Information Processing and Management 38 (5), 727-742

Jansen, B.J., Spink, A. & Saracevic, T. (2000) Real life, real users, and real needs: a study and analysis of user queries on the web Information Processing and Management 36, 207-227

Muresan, G. & Harper, D.J. (2004) Topic Modeling for Mediated Access to Very Large Document Collections Journal of the American Society for Information Science and Technology 55(10), pp. 892-910

Ozmultu, H.C. & Spink, A., (2002) Characteristics of question format web queries: an exploratory study Information Processing & Management 38, 453-471

Ozmutlu, S. (2006) Automatic new topic identification using multiple linear regression Information Processing and Management 42, 934-950

Ozmutlu, H.C. & Cavdur, F. (2005a) Application of automatic topic identification on excite web search engine data logs Information Processing and Management 41(5), 1243-1262

Ozmutlu, H.C. & Cavdur, F. (2005b) Neural network applications for automatic new topic identification Online Information Review 29, 35-53

Ozmutlu, H.C., Cavdur, F., Ozmutlu, S. & Spink, A. (2004a) Neural Network Applications for Automatic New Topic Identification on Excite Web search engine datalogs. Proceedings of ASIST 2004, Providence, RI 310-316

Ozmutlu, S., Ozmutlu, H. C. & Spink, (2002b) Multimedia Web searching. ASIST 2002: Proceedings of the 65th American Society of Information Science and Technology Annual Meeting, Philadephia 403-408

Ozmutlu, S., Ozmutlu, H.C. & Spink, A. (2003a) Multitasking Web searching and implications for design Proceedings of ASIST 2003, Long Beach, CA 416-421

Ozmutlu, S., Ozmutlu, H. C., & Spink, A., (2003b) Are people asking questions of general web search engines Online Information Review 27, 396-406

Ozmutlu, S., Spink, A., & Ozmutlu, H. C. (2003c) Trends in multimedia web searching: 1997-2001 Information Processing and Management 39, 611-621

Ozmutlu, S., Ozmutlu, H.C. & Spink, A. (2004b) A day in the life of Web searching: an exploratory study Information Processing and Management 40, 319-345

Ozmutlu, S., Spink, A. & Ozmutlu, H.C. (2002a) Analysis of large data logs: an application of Poisson sampling on excite web queries Information Processing and Management 38, 473-490

Pu, H.T., Chuang, S-L. & Yang, C. (2002) Subject Categorization of Query Terms for Exploring Web Users’ Search Interests Journal of the American Society for Information Science and Technology 53(8), 617-630

Shafer,G. (1976) A mathematical theory of evidence Princeton University Press, Princeton, NJ, 1976

Silverstein, C., Henzinger, M., Marais, H. & Moricz, M. (1999) Analysis of a very large Web search engine query log ACM SIGIR Forum 33(1), 6-12

Spink, A., Jansen, B.J. & Ozmultu, H.C. (2000) Use of query reformulation and relevance feedback by Excite users Internet Research: Electronic Networking Applications and Policy 10, 317-328.

Spink, A., Jansen, B.J., Wolfram, D. & Saracevic, T. (2002a) From e-sex to e-commerce: Web search changes IEEE Computer 35(3), pp. 133-135

Spink, A., Ozmutlu, H.C. & Ozmutlu, S. (2002b) Multitasking information seeking and searching processes Journal of the American Society for Information Science and Technology 53(8), 639-652

Spink, A., Wolfram, D., Jansen, B.J. & Saracevic, T., (2001) Searching the Web: The public and their queries Journal of the American Society for Information Science and Technology 53(2), 226-234

Wen, J.R. , Nie, J.Y. & Zhang, H.J. (2002) Query Clustering Using User Logs ACM Transactions on Information Systems 20(1), 59-81


Downloads per month over past year

Actions (login required)

View Item View Item