Web searching, search engines and Information Retrieval

Lewandowski, Dirk Web searching, search engines and Information Retrieval. Information Services & Use, 2005, vol. 18, n. 3. (In Press) [Journal article (Unpaginated)]

[thumbnail of isu_preprint.pdf]
Preview
PDF
isu_preprint.pdf

Download (254kB) | Preview

English abstract

This article discusses Web search engines; mainly the challenges in indexing the World Wide Web, the user behaviour, and the ranking factors used by these engines. Ranking factors are divided into query-dependent and query-independent factors, the latter of which have become more and more important within recent years. The possibilities of these factors are limited, mainly of those that are based on the widely used link popularity measures. The article concludes with an overview of factors that should be considered to determine the quality of Web search engines.

Item type: Journal article (Unpaginated)
Keywords: search engines; Information Retrieval
Subjects: L. Information technology and library technology > LS. Search engines.
Depositing user: Dirk Lewandowski
Date deposited: 07 Sep 2005
Last modified: 02 Oct 2014 12:01
URI: http://hdl.handle.net/10760/6702

References

Acharya, A.; Cutts, M.; Dean, J.; Haahr, P.; Henzinger, M.; Hoelzle, U.; Lawrence, S.; Pfleger, K.; Sercinoglu, O.; Tong, S. (2005): Information retrieval based on historical data. Patent Application US 2005/0071741 A1 (published: 31.3.2005)

Bergman, M. K. (2001): The Deep Web: Surfacing Hidden Value. Journal of Electronic Publishing 7(1). http://www.press.umich.edu/jep/07-01/bergman.html [22.8.2005]

Broder, A. (2002): A taxonomy of web search. SIGIR Forum 36(2). http://www.acm.org/sigir/forum/F2002/broder.pdf [22.8.2005]

Chakrabarti, S. (2003): Mining the Web: Discovering Knowledge from Hypertext Data. Amsterdam (u.a.): Morgan Kaufmann

Clay, B. (2004): Search Engine Relationship Chart. http://www.bruceclay.com/searchenginechart.pdf [22.8.2005]

Fetterley, D.; Manasse, M.; Najork, M.: Spam, Damn Spam, and Statistics. Seventh International Workshop on the Web and Databases (WebDB 2004), June 17-18, 2004, Paris, France, pp. 1-6

Gee. K.R.: Using Latent Semantic Indexing to Filter Spam. Proceedings of SAC 2003, Florida, USA. pp. 460-464

Gulli, A.; Signorini, A. (2005): The Indexable Web is More than 11.5 billion pages. Proceedings of the Special interest tracks and posters of the 14th international conference on World Wide Web, May 10-14, 2005, Chiba, Japan. pp. 902-903

Gyögyi, Z.; Garcia-Molina, H.; Pedersen, J.: Combating Spam with TrustRank. Proceedings of the 30th VLDB Conference, Toronto, Canada, 2004, pp. 576-587

Hamilton, N. (2003): The Mechanics of a Deep Net Metasearch Engine. http://turbo10.com/papers/deepnet.pdf [22.8.2005]

Jansen, B. J.; Spink, A.; Saracevic, T. (2000): Real Life, Real Users, and Real Needs: A Study and Analysis of User Queries on the Web. Information Processing & Management 36(2), pp. 207-227

Kleinberg, J. (1999): Authoritative Sources in a Hyperlinked Environment. Journal of the ACM 46(5), pp. 604-632

Lawrence, S., Giles, C. L. (1998): Searching the World Wide Web. Science 280, pp. 98-100

Lawrence, S., Giles, C. L. (1999): Accessibility of information on the web. Nature 400(8), pp. 107-109

Lewandowski,, D. (2004): Abfragesprachen und erweiterte Funktionen von WWW-Suchmaschinen. Information: Wissenschaft und Praxis 55(2), pp. 97-102

Lewandowski, D. (2005): Web Information Retrieval. Frankfurt am Main, DGI, 2005

Lewandowski, D. (2005): Yahoo - Zweifel an den Angaben zur Indexgröße, Suche in mehreren Sprachen. Password 20(9) [to appear]

Lewandowski, D.; Wahlig, H.; Meyer-Bautor, G.: The Freshness of Web Search Engines’ Databases. [to appear]

Machill, M.; Lewandowski, D.; Karzauninkat, S. (2005): Journalistische Aktualität im Internet. Ein Experiment mit den News-Suchfunktionen von Suchmaschinen. In: Machill, M.; Schneider, N. (Hrsg.): Suchmaschinen: Herausforderung für die Medienpolitik. Berlin: Vistas 2005, pp. 105-164

Machill, M.; Neuberger, C.; Schweiger, W.; Wirth, W. (2003): Wegweiser im Netz: Qualität und Nutzung von Suchmaschinen. In: Machill, M.; Welp, C. (Hrsg.): Wegweiser im Netz: Qualität und Nutzung von Suchmaschinen. Gütersloh: Verlag Bertelsmann Stiftung, pp. 13-490

Notess, G. (2003): Search Engine Statistics: Database Total Size Estimates. http://www.searchengineshowdown.com/stats/sizeest.shtml [7.7.2005]

Notess, G. (2003): Search Engine Statistics: Freshness Showdown. http://www.searchengineshowdown.com/stats/freshness.shtml [7.7.2005]

Ntoulas, A.; Cho, J.; Olston, C. (2004): What's New on the Web? The Evolution of the Web from a Search Engine Perspective. Proceedings of the Thirteenth WWW Conference, New York, USA. http://oak.cs.ucla.edu/~ntoulas/pubs/ntoulas_new.pdf [22.8.2005]

Page, L., Brin, S., Motwani, R., Winograd, T. (1998): The PageRank citation ranking: Bringing order to the Web. http://dbpubs.stanford.edu:8090/pub/1999-66 [22.8.2005]

Savoy, J.; Rasolofo, Y. (2001): Report on the TREC-9 Experiment: Link-Based Retrieval and Distributed Collections. http://trec.nist.gov/pubs/trec9/papers/unine9.pdf [22.8.2005]

Seuss, D. (2004): Ten Years Into the Web, and the Search Problem is Nowhere Near Solved. Computers In Libraries Conference, March 10-12, 2004. http://www.infotoday.com/cil2004/presentations/seuss.pps [22.8.2005]

Sherman, C. (2001): Search for the Invisible Web. Guardian Unlimited 6.9.2001. http://www.guardian.co.uk/online/story/0,3605,547140,00.html [22.8.2005]

Sherman, C.; Price, G. (2001): The Invisible Web: Uncovering Information Sources Search Engines Can't See. Medford, NJ: Information Today

Singhal, Amit (2004): Challenges in Running a Commercial Search Engine. http://www.research.ibm.com/haifa/Workshops/searchandcollaboration2004/papers/haifa.pdf [22.8.2005]

Smith, A. G. (2004): Web links as analogues of citations. Information Research 9(4). http://informationr.net/ir/9-4/paper188.html [22.8.2005]

Spink, A.; Jansen, B. J. (2004): Web Search: Public Searching of the Web. Dordrecht: Kluwer Academic Publishers

Stock, W. G. (2003): Weltregionen des Internet: Digitale Informationen im WWW und via WWW. Password Nr. 18(2), pp. 26-28

Thelwall, M. (2004): Link Analysis: An Information Science Approach. Amsterdam [u.a.]: Elsevier Academic Press

Vaughan, L. (2004): New measurements for search engine evaluation proposed and tested. In: Information Processing and Management 40(4), pp. 677-691

Vaughan, L.; Thelwall, M. (2004): Search Engine Coverage Bias: Evidence and Possible Causes. Information Processing & Management, 40(4), pp. 693-707

Wu, B.; Davison, B.D.: Identifying Link Farm Spam Pages. Proceedings of WWW 2005, May 10-14, Chiba, Japan, pp. 820-829


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item