A three-year study on the freshness of Web search engine databases

Lewandowski, Dirk A three-year study on the freshness of Web search engine databases., 2008 [Preprint]

Preview

PDF
JIS2008_preprint.pdf
Download (660kB) | Preview

English abstract

This paper deals with one aspect of the index quality of search engines: index freshness. The purpose is to analyse the update strategies of the major Web search engines Google, Yahoo, and MSN/Live.com. We conducted a test of the updates of 40 daily updated pages and 30 irregularly updated pages, respectively. We used data from a time span of six weeks in the years 2005, 2006, and 2007. We found that the best search engine in terms of up-to-dateness changes over the years and that none of the engines has an ideal solution for index freshness. Frequency distributions for the pages’ ages are skewed, which means that search engines do differentiate between often- and seldom-updated pages. This is confirmed by the difference between the average ages of daily updated pages and our control group of pages. Indexing patterns are often irregular, and there seems to be no clear policy regarding when to revisit Web pages. A major problem identified in our research is the delay in making crawled pages available for searching, which differs from one engine to another.

Item type:	Preprint
Keywords:	search engines; online information retrieval; World Wide Web; index freshness
Subjects:	H. Information sources, supports, channels. > HQ. Web pages. L. Information technology and library technology > LS. Search engines.
Depositing user:	Dirk Lewandowski
Date deposited:	19 Jan 2008
Last modified:	02 Oct 2014 12:10
URI:	http://hdl.handle.net/10760/11024

Check full metadata for this record

References

Downloads

Downloads per month over past year

Actions (login required)

View Item

Facebook

Twitter

RSS