Adapting content repositories for crawling and serving

Invention Grant

US08972375B2 Adapting content repositories for crawling and serving 有权

Title translation: 适应内容存储库进行爬网和投放

Please log in to see more content

Patent Title: Adapting content repositories for crawling and serving
Patent Title (中): 适应内容存储库进行爬网和投放
Application No.: US13721863

Application Date: 2012-12-20
Publication No.: US08972375B2

Publication Date: 2015-03-03
Inventor: Pawel Opalinski , Brandon Player Iles , Eric Jon Anderson , John Felton
Applicant: Google Inc.
Applicant Address: US CA Mountain View
Assignee: Google Inc.
Current Assignee: Google Inc.
Current Assignee Address: US CA Mountain View
Agency: Brake Hughes Bellermann LLP
Main IPC: G06F17/30
IPC: G06F17/30

Adapting content repositories for crawling and serving

Abstract:

A system for searching files stored in a closed file source that is not accessible via a web crawler obtains file identifiers for files stored in the file source and creates a unique URL for each of the identifiers. Each URL may be based on a file identifier and a domain portion of a URL associated with the system. The system may provide the unique URLs to a search engine. The system may respond to a crawl request from the search engine for a particular URL by converting the URL back into a file identifier, obtaining the contents of the file, creating an HTTP response from the contents of the file, and returning the response to the search engine. The system may respond to a request for a seed URL with a plurality of URLs as links in a single HTTP response.

Abstract(Chinese):

用于搜索存储在不能通过网页抓取器访问的封闭文件源中的文件的系统获得存储在文件源中的文件的文件标识符，并为每个标识符创建唯一的URL。每个URL可以基于与系统相关联的URL的文件标识符和域部分。系统可以向搜索引擎提供唯一的URL。系统可以通过将URL转换回文件标识符，获取文件的内容，从文件的内容创建HTTP响应，并将响应返回到搜索引擎。该系统可以响应对具有多个URL的种子URL的请求作为单个HTTP响应中的链接。

Public/Granted literature

US20130332443A1 ADAPTING CONTENT REPOSITORIES FOR CRAWLING AND SERVING Public/Granted day:2013-12-12

Information query

Espacenet