The Freshness of Web search engines’ databases

Lewandowski, Dirk and Wahlig, Henry and Meyer-Bautor, Gunnar The Freshness of Web search engines’ databases. Journal of Information Science, 2005, vol. 31. (In Press) [Journal article (Unpaginated)]

[img]
Preview
PDF
jis_preprint.pdf

Download (580kB) | Preview

English abstract

This study measures the frequency in which search engines update their indices. Therefore, 38 websites that are updated on a daily basis were analysed within a time-span of six weeks. The analysed search engines were Google, Yahoo and MSN. We find that Google performs best overall with the most pages updated on a daily basis, but only MSN is able to update all pages within a time-span of less than 20 days. Both other engines have outliers that are quite older. In terms of indexing patterns, we find different approaches at the different engines: While MSN shows clear update patterns, Google shows some outliers and the update process of the Yahoo index seems to be quite chaotic. Implications are that the quality of different search engine indices varies and not only one engine should be used when searching for current content.

Item type: Journal article (Unpaginated)
Keywords: search engines; Online Information Retrieval; World Wide Web; index quality; index freshness
Subjects: L. Information technology and library technology > LS. Search engines.
Depositing user: Dirk Lewandowski
Date deposited: 07 Sep 2005
Last modified: 02 Oct 2014 12:01
URI: http://hdl.handle.net/10760/6701

References

A. Acharya, A., M. Cutts, J. Dean, P. Haahr, M. Henzinger, U. Hoelzle, S. Lawrence, K. Pfleger, O. Sercinoglu, and S. Tong, Information retrieval based on historical data (Patent Application US 2005/0071741 A1, 2005)

V. Cothey, Web-Crawling Reliability, Journal of the American Society for Information Science and Technology 55(14) (2004) 1228-1238.

N. Ford, D. Miller and N. Moss, Web search strategies and retrieval effectiveness: an empirical study, Journal of Documentation 58(1) (2002) 30-48

R. Fries, W. Schweibenz, J. Strobel and P. Wiland, Was indexieren Suchmaschinen? Eine Untersuchung zu Indexierungsmechanismen von Suchmaschinen im World Wide Web, BIT Online 4(1) (2001) 49-56.

J. Griesbaum, Evaluation of three German search engines: Altavista.de, Google.de and Lycos.de (2004). Available at: http://informationr.net/ir/9-4/paper189.html (accessed 8 May 2005).

J. Griesbaum, M. Rittberger and B. Bekavac, Deutsche Suchmaschinen im Vergleich: AltaVista.de, Fireball.de, Google.de und Lycos.de. In: R. Hammwöhner, C. Wolff, C. Womser-Hacker (eds.), Information und Mobilität. Optimierung und Vermeidung von Mobilität durch Information. Proceedings des 8. Internationalen Symposiums für Informationswissenschaft (UVK, Konstanz, 2002).

S. Lawrence and C.L. Giles, Searching the World Wide Web, Science 280 (1998) 98-100.

S. Lawrence and C.L. Giles: Accessibility of information on the web. Nature 400(8) (1999) 107-109.

H. Leighton and J. Srivastava, First 20 Precision among World Wide Web Search Services (Search Engines), Journal of the American Society for Information Science 50(10) (1999) 870-881.

D. Lewandowski, Date-restricted queries in web search engines, Online Information Review 28(6) (2004) 420-427.

L. Lo Grasso and H. Wahlig, Google und seine Suchparameter: Eine Top 20-Precision Analyse anhand repräsentativ ausgewählter Anfragen. Information Wissenschaft und Praxis 56(2) (2005) 77-86.

M. Machill and C. Welp (eds.), Wegweiser im Netz: Qualität und Nutzung von Suchmaschinen (Verlag Bertelsmann Stiftung, Gütersloh, 2003).

M. Machill, D. Lewandowski and S. Karzauninkat, Journalistische Aktualität im Internet. Ein Experiment mit den “News-Suchfunktionen” von Suchmaschinen. In: M. Machill and N. Schneider (eds.), Suchmaschinen: Eine Herausforderung für die Medienpolitik, (Vistas, Berlin, 2005).

A. Mowshowitz and A. Kawaguchi, Assessing bias in search engines, Information Processing & Management 38(1) (2001) 141-156.

G. Notess, Search Engine Statistics: Freshness Showdown [Data from 17 May 2003] (2003). Available at: http://www.searchengineshowdown.com/stats/freshness.shtml (accessed 17 April 2005).

G. Notess, Search Engine Statistics: Freshness Showdown [Data from 20 October 2002] (2002). Available at: http://www.searchengineshowdown.com/stats/0210freshness.shtml (accessed 17 April 2005).

G. Notess, Search Engine Statistics: Freshness Showdown [Data from4 April 2002] (2002). Available at: http://www.searchengineshowdown.com/stats/0204freshness.shtml (accessed 17 April 2005).

G. Notess, Search Engine Statistics: Freshness Showdown [Data from 7 March 2002] (2002). Available at: http://www.searchengineshowdown.com/stats/0203freshness.shtml (accessed 17 April 2005).

G. Notess, Search Engine Statistics: Freshness Showdown [Data from 13 August 2001] (2001). Available at: http://www.searchengineshowdown.com/stats/0108freshness.shtml (accessed 17 April 2005).

A. Ntoulas, J. Cho and C. Olston, What's New on the Web? The Evolution of the Web from a Search Engine Perspective (2004). In: Proceedings of the Thirteenth WWW Conference, New York, USA. http://oak.cs.ucla.edu/~ntoulas/pubs/ntoulas_new.pdf (accessed 8 May 2005).

A. Singhal, and M. Kaszkiel, A Case Study in Web Search using TREC Algorithms. In: Tenth World Wide Web Conference 2001: Proceedings of the 10th World Wide Web Conference (ACM Press, New York, 2001).

D. Sullivan: Nielsen Net Ratings Search Engine Ratings, Searchenginewatch.com. http://searchenginewatch.com/reports/article.php/2156451 (accessed 22 April 2005).

L. Vaughan and M. Thelwall, Search Engine Coverage Bias: Evidence and Possible Causes, Information Processing & Management 40(4) (2004) 693-707.

C. Wolff, Effektivität von Recherchen im WWW: Vergleichende Evaluierung von Such- und Metasuchmaschinen. In: G. Knorz and R. Kuhlen (eds.), Informationskompetenz - Basiskompetenz in der Informationsgesellschaft, Proceedings des 7. Internationalen Symposiums für Informationswissenschaft (UVK, Konstanz, 2000).


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item