(Science,April 3, 1998)
Searching the World Wide Web  (abstract and data from Table 1)

Steve Lawrence, C. Lee Giles *

The coverage and recency of the major World Wide Web search engines was
analyzed, yielding some surprising results. The coverage of any one engine
is significantly limited: No single engine indexes more than about one-third
of the "indexable Web," the coverage of the six engines investigated varies
by an order of magnitude, and combining the results of the six engines
yields about 3.5 times as many documents on average as compared with the
results from only one engine. Analysis of the overlap between pairs of
engines gives an estimated lower bound on the size of the indexable Web of
320million pages.

Computer Science, NEC Research Institute, 4Independence Way, Princeton, NJ
08540,USA. E-mail: lawrence@research.nj.nec.com (S.L.) or
giles@research.nj.nec.com (C.L.G.)
* Also with the Institute for Advanced Computer Studies, University of
Maryland, College Park, MD 20742,USA.
 Table 1. Estimated coverage of each engine with respect to the combined
 coverage of all six (averaged over 575queries performed during 15to
 17December 1997), along with the 95% confidence interval (C.I.). HotBot
 is the most comprehensive in this comparison. Note that these results are
 specific to the particular queries performed (typical queries made by
 scientists) and the state of the engine databases at the time they were
 performed. Note also that the results may be partly due to different
 indexing rather than different database sizes: Different engines may not
 index identical words for the same document (for example, the engines
 typically impose a maximum file size and effectively truncate oversized
 documents). However, changes in the results due to different indexing are
 reflective of the coverage of the engines.

       Search engine             Coverage (%)            95% C.I. (%)
 HotBot                             	 57.5          	  1.3
 AltaVista                         	46.5          	 1.3
 Northern Light                 	 32.9          	 1.1
 Excite                              	 23.1         	  0.86
 Infoseek                           	16.5          	 1.0
 Lycos                                	 4.41        	 0.42