METHODS AND SYSTEMS FOR PROVIDING CUSTOM CRAWL-TIME METADATA
    1.
    发明申请
    METHODS AND SYSTEMS FOR PROVIDING CUSTOM CRAWL-TIME METADATA 有权
    提供自定义时间元数据的方法和系统

    公开(公告)号:US20130332445A1

    公开(公告)日:2013-12-12

    申请号:US13721857

    申请日:2012-12-20

    Applicant: GOOGLE INC.

    Abstract: A method for providing metadata to a search engine for a document that is not in a mark-up language receives a request for contents of the document and locates metadata associated with the document. The method further creates name-value pairs for the metadata and provides to the search engine server a response comprising the name-value pair in an HTTP (or HTTPS) header and the contents of the document. In other implementations, a method includes sending a request for contents of the document and receiving a response to the request comprising an HTTP header with metadata about the document in a name-value pair and the document's content. The method also includes extracting the name-value pair from the HTTP header, creating a mark-up language tag for the name-value pair, and providing the make-up language tag and the contents of the document in a mark-up language format to a search index creation component.

    Abstract translation: 向搜索引擎提供不属于标记语言的文档的元数据的方法接收对该文档的内容的请求并定位与该文档相关联的元数据。 该方法还为元数据创建名称 - 值对,并向搜索引擎服务器提供包含HTTP(或HTTPS)头中的名称 - 值对和文档内容的响应。 在其他实现中,一种方法包括发送对该文档的内容的请求,并且接收对该请求的响应,该请求包括具有名称 - 值对和文档内容中的该文档的元数据的HTTP头部。 该方法还包括从HTTP头提取名称 - 值对,为名称 - 值对创建标记语言标签,并以标记语言格式提供化妆语言标签和文档的内容 到搜索索引创建组件。

    Methods and systems for providing custom crawl-time metadata

    公开(公告)号:US10430490B1

    公开(公告)日:2019-10-01

    申请号:US15391043

    申请日:2016-12-27

    Applicant: Google Inc.

    Abstract: A method for providing metadata to a search engine for a document that is not in a mark-up language receives a request for contents of the document and locates metadata associated with the document. The method further creates name-value pairs for the metadata and provides to the search engine server a response comprising the name-value pair in an HTTP (or HTTPS) header and the contents of the document. In other implementations, a method includes sending a request for contents of the document and receiving a response to the request comprising an HTTP header with metadata about the document in a name-value pair and the document's content. The method also includes extracting the name-value pair from the HTTP header, creating a mark-up language tag for the name-value pair, and providing the make-up language tag and the contents of the document in a mark-up language format to a search index creation component.

    Methods and systems for providing custom crawl-time metadata
    3.
    发明授权
    Methods and systems for providing custom crawl-time metadata 有权
    提供自定义抓取时间元数据的方法和系统

    公开(公告)号:US09582588B2

    公开(公告)日:2017-02-28

    申请号:US13721857

    申请日:2012-12-20

    Applicant: Google Inc.

    Abstract: A method for providing metadata to a search engine for a document that is not in a mark-up language includes sending a request for data about the document and receiving a response to the request that has a Hyper-Text Transfer Protocol (HTTP or HTTPS) header that includes metadata associated with the document in a name-value pair and the document's content. The method also includes extracting the name-value pair from the HTTP-header and creating a mark-up language tag for the name-value pair and providing the make-up language tag and the contents of the document in a mark-up language format to a search index creation component.

    Abstract translation: 向搜索引擎提供不属于标记语言的文档的元数据的方法包括发送关于文档的数据的请求并且接收对具有超文本传输​​协议(HTTP或HTTPS)的请求的响应, 标题,其包括与名称 - 值对中的文档相关联的元数据和文档的内容。 该方法还包括从HTTP标题中提取名称 - 值对并为名称 - 值对创建标记语言标签,并以标记语言格式提供化妆语言标签和文档的内容 到搜索索引创建组件。

    Adapting content repositories for crawling and serving
    4.
    发明授权
    Adapting content repositories for crawling and serving 有权
    适应内容存储库进行爬网和投放

    公开(公告)号:US08972375B2

    公开(公告)日:2015-03-03

    申请号:US13721863

    申请日:2012-12-20

    Applicant: Google Inc.

    CPC classification number: G06F17/30864 G06F17/301 G06F17/30887 G06F17/30893

    Abstract: A system for searching files stored in a closed file source that is not accessible via a web crawler obtains file identifiers for files stored in the file source and creates a unique URL for each of the identifiers. Each URL may be based on a file identifier and a domain portion of a URL associated with the system. The system may provide the unique URLs to a search engine. The system may respond to a crawl request from the search engine for a particular URL by converting the URL back into a file identifier, obtaining the contents of the file, creating an HTTP response from the contents of the file, and returning the response to the search engine. The system may respond to a request for a seed URL with a plurality of URLs as links in a single HTTP response.

    Abstract translation: 用于搜索存储在不能通过网页抓取器访问的封闭文件源中的文件的系统获得存储在文件源中的文件的文件标识符,并为每个标识符创建唯一的URL。 每个URL可以基于与系统相关联的URL的文件标识符和域部分。 系统可以向搜索引擎提供唯一的URL。 系统可以通过将URL转换回文件标识符,获取文件的内容,从文件的内容创建HTTP响应,并将响应返回到 搜索引擎。 该系统可以响应对具有多个URL的种子URL的请求作为单个HTTP响应中的链接。

    ADAPTING CONTENT REPOSITORIES FOR CRAWLING AND SERVING
    5.
    发明申请
    ADAPTING CONTENT REPOSITORIES FOR CRAWLING AND SERVING 有权
    适应内容清理和服务的内容

    公开(公告)号:US20130332443A1

    公开(公告)日:2013-12-12

    申请号:US13721863

    申请日:2012-12-20

    Applicant: GOOGLE INC.

    CPC classification number: G06F17/30864 G06F17/301 G06F17/30887 G06F17/30893

    Abstract: A system for searching files stored in a closed file source that is not accessible via a web crawler obtains file identifiers for files stored in the file source and creates a unique URL for each of the identifiers. Each URL may be based on a file identifier and a domain portion of a URL associated with the system. The system may provide the unique URLs to a search engine. The system may respond to a crawl request from the search engine for a particular URL by converting the URL back into a file identifier, obtaining the contents of the file, creating an HTTP response from the contents of the file, and returning the response to the search engine. The system may respond to a request for a seed URL with a plurality of URLs as links in a single HTTP response.

    Abstract translation: 用于搜索存储在不能通过网页抓取器访问的封闭文件源中的文件的系统获得存储在文件源中的文件的文件标识符,并为每个标识符创建唯一的URL。 每个URL可以基于与系统相关联的URL的文件标识符和域部分。 系统可以向搜索引擎提供唯一的URL。 系统可以通过将URL转换回文件标识符,获取文件的内容,从文件的内容创建HTTP响应,并将响应返回到 搜索引擎。 该系统可以响应对具有多个URL的种子URL的请求作为单个HTTP响应中的链接。

Patent Agency Ranking