The Anatomy of a Search Engine
Popularity Report
![]() |
|||
![]() |
|||
![]() |
|||
![]() |
|||
![]() |
|||
![]() |
URL Tag Cloud
Bookmark History
Saved by 46 people (-16 private), first by anonymouse user on 2006-10-29
- Maxugaz on 2009-11-02 - Tags Google
- Galaen on 2009-11-01 - Tags PageRank , Indexing , algorithms , google , search , architecture
- Decoeur on 2009-10-22 - Tags week7
- Tmarch on 2009-10-09 - Tags google , algorithms , history
- Daniel_teacher on 2009-10-01 - Tags google , search , engine , seo , algorithms , pagerank , architecture
Public Sticky notes
Highlighted by galaen
Highlighted by galaen
Highlighted by cmccooey
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by galaen
Highlighted by cmccooey
Highlighted by decoeur
The web creates new challenges for information retrieval. The amount of information on the web is growing rapidly, as well as the number of new users inexperienced in the art of web research. People are likely to surf the web using its link graph, often starting with high quality human maintained indices such as Yahoo! or with search engines. Human maintained lists cover popular topics effectively but are subjective, expensive to build and maintain, slow to improve, and cannot cover all esoteric topics. Automated search engines that rely on keyword matching usually return too many low quality matches. To make matters worse, some advertisers attempt to gain people's attention by taking measures meant to mislead automated search engines. We have built a large-scale search engine which addresses many of the problems of existing systems. It makes especially heavy use of the additional structure present in hypertext to provide much higher quality search results. We chose our system name, Google, because it is a common spelling of googol, or 10100 and fits well with our goal of building very large-scale search engines.
Highlighted by hkfn_123
Highlighted by zeenko
Highlighted by zeenko
Highlighted by zeenko
Highlighted by zeenko
Highlighted by zeenko
Highlighted by hkfn_123
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by zeenko
Highlighted by zeenko
Highlighted by zeenko
Highlighted by microli
Highlighted by zeenko
Highlighted by microli
Highlighted by microli
Highlighted by zeenko
Highlighted by zeenko
Highlighted by zeenko
Highlighted by zeenko
Highlighted by zeenko
Highlighted by microli
Highlighted by zeenko
Highlighted by glasswort
Highlighted by microli
Highlighted by glasswort
Highlighted by microli
Highlighted by microli
Highlighted by zeenko
Highlighted by zeenko
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by zeenko
Highlighted by glasswort
Highlighted by zeenko
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by zeenko
In Google, the web crawling (downloading of web pages) is done by several distributed crawlers. There is a URLserver that sends lists of URLs to be fetched to the crawlers. The web pages that are fetched are then sent to the storeserver. The storeserver then compresses and stores the web pages into a repository. Every web page has an associated ID number called a docID which is assigned whenever a new URL is parsed out of a web page. The indexing function is performed by the indexer and the sorter. The indexer performs a number of functions. It reads the repository, uncompresses the documents, and parses them. Each document is converted into a set of word occurrences called hits. The hits record the word, position in document, an approximation of font size, and capitalization. The indexer distributes these hits into a set of "barrels", creating a partially sorted forward index. The indexer performs another important function. It parses out all the links in every web page and stores important information about them in an anchors file. This file contains enough information to determine where each link points from and to, and the text of the link.
The URLresolver reads the anchors file and converts relative URLs into absolute URLs and in turn into docIDs. It puts the anchor text into the forward index, associated with the docID that the anchor points to. It also generates a database of links which are pairs of docIDs. The links database is used to compute PageRanks for all the documents.
The sorter takes the barrels, which are sorted by docID (this is a simplification, see Section 4.2.5), and resorts them by wordID to generate the inverted index. This is done in place so that little temporary space is needed for this operation. The sorter also produces a list of wordIDs and offsets into the inverted index. A program called DumpLexicon takes this list together with the lexicon produced by the indexer and generates a new lexicon to be used by the searcher. The searcher is run by a web server and uses the lexicon built by DumpLexicon together with the inverted index and the PageRanks to answer queries.
Highlighted by hariprasade
Highlighted by glasswort
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by glasswort
Highlighted by microli
Highlighted by zeenko
Highlighted by microli
Highlighted by microli
Highlighted by zeenko
Highlighted by microli
Highlighted by microli
Highlighted by glasswort
Highlighted by microli
Highlighted by zeenko
Highlighted by zeenko
Highlighted by zeenko
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by daniel_teacher
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by glasswort
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by glasswort
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by zeenko
Highlighted by microli
Highlighted by microli
Highlighted by zeenko
Highlighted by zeenko
Highlighted by microli
Highlighted by glasswort
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by zeenko
Highlighted by zeenko
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by glasswort
Highlighted by glasswort
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by glasswort
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by glasswort
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by glasswort
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by microli
Highlighted by glasswort
Highlighted by glasswort
Highlighted by glasswort
Highlighted by glasswort


Public Comment