Invention Grant
US08676783B1 Method and apparatus for managing a backlog of pending URL crawls
有权
用于管理待处理的URL爬网积压的方法和装置
- Patent Title: Method and apparatus for managing a backlog of pending URL crawls
- Patent Title (中): 用于管理待处理的URL爬网积压的方法和装置
-
Application No.: US13170890Application Date: 2011-06-28
-
Publication No.: US08676783B1Publication Date: 2014-03-18
- Inventor: Pawel Aleksander Fedorynski , Sumitro Samaddar
- Applicant: Pawel Aleksander Fedorynski , Sumitro Samaddar
- Applicant Address: US CA Mountain View
- Assignee: Google Inc.
- Current Assignee: Google Inc.
- Current Assignee Address: US CA Mountain View
- Main IPC: G06F17/30
- IPC: G06F17/30 ; G06F7/00

Abstract:
The technology described relates to reducing a backlog of pending URL crawls in view of a limited URL crawl capacity. This technology is useful for crawling URLs with low latency. Because of the limited crawl capacity, uncrawled URLs from crawl requests are entered into a backlog data structure of pending crawl requests. Various criteria are applied to the URL's that are requested to be crawled, so that less important URL crawls are rejected early from the backlog data structure. This early rejection tends to limit the backlog data structure to the more important pending URL crawls, and tends to keep the average latency low by quickly failing the less important requested URL crawls.
Information query