HP3000-L Archives

December 2000, Week 4

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Wirt Atmar <[log in to unmask]>
Reply To:
Date:
Fri, 22 Dec 2000 17:41:21 EST
Content-Type:
text/plain
Parts/Attachments:
text/plain (71 lines)
Tony asks:

> How is that you can find stuff on the web so fast.  Do you use some super
>  secret AICS search engine or something?
>
>  ....I guess I had better go search for my list of search engines that
search
>  search engines. ;-)

Google is the one word answer.

Google is simply extraordinarily fast and accurate -- and up-to-date. The
algorithm it uses is the same as used in the Science Citation Index for the
importance of scientific papers: how often a web page is referenced by its
peers.

The other search engines simply look through the text of a page and weight
the relevance of a page by the number of times the words (or their close
relatives) that you use in your search phrase appear on the page, divided
generally by the number of words on the page and then by some measure of how
close they appear to one another on the page. Because of this search engine
behavior, a lot of web pages load up their text with all sorts of key words,
but displaying the text at the bottom of the page in the same color as the
background so that you can't readily see it.

More importantly, most of the search engines work on an implicit OR basis,
such that if you type in the phrase

     Merry Christmas,

you get all of the pages that have either the words

     Merry
or
     Christmas

anywhere on the page. In Altavista, for example, if you want to get the same
behavior as Google, you have to type either

     "Merry Christmas"

to get the two words adjoining one another, as a phrase, or

     +Merry +Christmas

in order to duplicate somewhat exactly Google's search process.

In contrast, in Google, the search pattern is always implicitly AND, and AND
is the only operator that is used. Further, no "stemming" (root word
variants) is introduced into the search. That actually makes the search much
cleaner and more obvious.

There are 6000 PC's running a stripped-down version of Linux that are wired
together to form the Google search engine, and it is amazingly fast. 1.5
billion web pages can generally be searched in less than half a second.

In any form of search, you want to use the rare words as your search keys.
Common words return far too many retrievals to be of any value, but in
today's case, I was lucky that Gaither was named Gaither Bynum. We lucky few
who have odd names can no longer hide from the power of the web, but Bob
Jones can :-), only because he's one in a million (literally).

But the same is true of any search. Think of the rarest words that will
likely appear on the web pages that you're looking for and you should find
what you want almost immediately. Just remember that each additional word
that you type in when searching with Google narrows the number of retrieved
entries, while on virtually all of the other search engines each additional
word (often dramatically) increases the number of pages that are retrieved.

Wirt Atmar

ATOM RSS1 RSS2