Cross Validation Of Neural Network Applications For Automatic New Topic Identification

Özmutlu, H. Cenk and Çavdur, Fatih and Spink, Amanda and Özmutlu, Seda Cross Validation Of Neural Network Applications For Automatic New Topic Identification., 2005 . In 68th Annual Meeting of the American Society for Information Science and Technology (ASIST), Charlotte (US), 28 October - 2 November 2005. [Conference paper]

[img]
Preview
PDF
Ozmutlu_Cross.pdf

Download (457kB) | Preview

English abstract

There are recent studies in the literature on automatic topic-shift identification in Web search engine user sessions; however most of this work applied their topic-shift identification algorithms on data logs from a single search engine. The purpose of this study is to provide the cross-validation of an artificial neural network application to automatically identify topic changes in a web search engine user session by using data logs of different search engines for training and testing the neural network. Sample data logs from the Norwegian search engine FAST (currently owned by Overture) and Excite are used in this study. Findings of this study suggest that it could be possible to identify topic shifts and continuations successfully on a particular search engine user session using neural networks that are trained on a different search engine data log.

Item type: Conference paper
Keywords: search engine indexes, topical searching, automated indexing
Subjects: L. Information technology and library technology > LL. Automated language processing.
I. Information treatment for information services > IC. Index languages, processes and schemes.
L. Information technology and library technology > LS. Search engines.
Depositing user: Norm Medeiros
Date deposited: 12 Mar 2006
Last modified: 02 Oct 2014 12:02
URI: http://hdl.handle.net/10760/7000

References

Beeferman, D. and Berger, A. (2000), Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA (pp. 407 - 416).

Beitzel, S.M., Jensen, E.C., Chowdhury, A., Grossman, D. and Frieder, O. (2004). Efficiency and Scaling: Hourly Analysis of a Very Large Topically Categorized Web Query Log. In Proceedings of the 27th Annual International Conference on Research and Development in Information Retrieval, Sheffield, UK (pp. 321-328).

He. D., Goker, A. and Harper, D.J. (2002). Combining evidence for automatic Web session identification, Information Processing and Management, 38(5), 727-742.

Jansen, B.J., Spink A. and Saracevic, T. (2000). Real life, real users, and real needs: a study and analysis of user queries on the web, Information Processing and Management, 36, (pp.207-227).

Miwa. (2001). User situations and multiple levels of users goals in information problem solving processes of AskERIC users. In Proceedings of the 2001 Annual Meeting of the American Society for Information Sciences and Technology, 38, (pp. 355-371).

Muresan, G. and Harper, D.J. (2004). Topic Modeling for Mediated Access to Very Large Document Collections”, Journal of the American Society for Information Science and Technology, 55(10), 892–910.

Ozmutlu, H.C. and Cavdur, F. (in press, a). Application of automatic topic identification on excite web search engine data logs, Information Processing and Management

Ozmutlu, S and Cavdur, F. (in press, b). Neural Network Applications for Automatic New Topic Identification, Online Information Review.

Ozmutlu, H.C., Cavdur, F., Ozmutlu, S. and Spink, A., (2004a). Neural Network Applications for Automatic New Topic Identification on Excite Web search engine datalogs, In Proceedings of ASIST 2004, Annual Meeting of the American Society for Information Science and Technology, Providence, RI, (pp. 310-316).

Ozmutlu, S., Spink, A. and Ozmutlu, H.C. (2002), “Analysis of large data logs: an application of Poisson sampling on excite web queries, Information Processing and Management, 38, 473-490.

Ozmutlu, S., Ozmutlu, H.C. and Spink, A. (2003). Multitasking Web searching and implications for design, In Proceedings of ASIST 2003, Annual Meeting of the American Society for Information Science and Technology, Long Beach, CA, (pp. 416-421).

Ozmutlu, S., Ozmutlu, H. C., &Spink, (2004b). A day in the life of Web searching: an exploratory study, Information Processing and Management, 40, 319-345.

Pu, H.T., Chuang, Shui-Lung &Yang, C. (2002). Subject Categorization of Query Terms for Exploring Web Users’ Search Interests, Journal of the American Society for Information Science and Technology, 53(8), 617–630.

Shafer,G. (1976). A mathematical theory of evidence. Princeton, NJ: Princeton University Press.

Silverstein, C., Henzinger, M., Marais, H., &Moricz, M. (1999). Analysis of a very large Web search engine query log. ACM SIGIR Forum, 33(1), 6-12.

Spink, A., Bateman, J., &Jansen, B.J. (1999). Searching Heterogeneous Collections on the Web: A survey of Excite users. Internet Research: Electronic Networking Applications and Policy, 9(2): 117-128.

Spink, A., Wolfram, D., Jansen, B.J., &Saracevic, T. (2001). Searching the Web: The public and their queries, Journal of the American Society for Information Science and Technology, 53(2), 226–234.

Spink, A., Jansen, B. J., Wolfram, D., &Saracevic, T. (2002a). From e-sex to e-commerce: Web search changes, IEEE Computer, 35(3), 133-135.

Spink, A., Ozmutlu, H. C., &Ozmutlu, S. (2002b). Multitasking information seeking and searching processes, Journal of the American Society for Information Science and Technology, 53(8), 639-652.

Wen, J.R., Nie, J.Y. and Zhang, H.J. (2002). Query Clustering Using User Logs, ACM Transactions on Information Systems, 20(1), 59–81.


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item