Invention Grant
- Patent Title: System and method for crawl ordering by search impact
- Patent Title (中): 通过搜索影响来抓取排序的系统和方法
-
Application No.: US12004881Application Date: 2007-12-20
-
Publication No.: US07899807B2Publication Date: 2011-03-01
- Inventor: Christopher Olston , Sandeep Pandey
- Applicant: Christopher Olston , Sandeep Pandey
- Applicant Address: US CA Sunnyvale
- Assignee: Yahoo! Inc.
- Current Assignee: Yahoo! Inc.
- Current Assignee Address: US CA Sunnyvale
- Agency: Law Office of Robert O. Bolan
- Main IPC: G06F7/00
- IPC: G06F7/00 ; G06F17/30

Abstract:
An improved system and method for crawl ordering of a web crawler by impact upon search results of a search engine is provided. Content-independent features of uncrawled web pages may be obtained, and the impact of uncrawled web pages may be estimated for queries of a workload using the content-independent features. The impact of uncrawled web pages may be estimated for queries by computing an expected impact score for uncrawled web pages that match needy queries. Query sketches may be created for a subset of the queries by computing an expected impact score for crawled web pages and uncrawled web pages matching the queries. Web pages may then be selected to fetch using a combined query-based estimate and query-independent estimate of the impact of fetching the web pages on search query results.
Public/Granted literature
- US20090164425A1 System and method for crawl ordering by search impact Public/Granted day:2009-06-25
Information query