Invention Grant
- Patent Title: Using context to extract entities from a document collection
- Patent Title (中): 使用上下文从文档集合中提取实体
-
Application No.: US12794779Application Date: 2010-06-07
-
Publication No.: US09251248B2Publication Date: 2016-02-02
- Inventor: Sanjay Agrawal
- Applicant: Sanjay Agrawal
- Applicant Address: US WA Redmond
- Assignee: Microsoft Licensing Technology, LLC
- Current Assignee: Microsoft Licensing Technology, LLC
- Current Assignee Address: US WA Redmond
- Agent Alin Corie; Kate Drakos; Micky Minhas
- Main IPC: G06F17/30
- IPC: G06F17/30

Abstract:
Described is using context information obtained from entity mentions in likely relevant documents to extract entity mentions from documents that are ambiguous with respect to their relevance to a domain. A list of entities is input into an entity extraction mechanism, which processes a large collection of documents to determine data (counts) corresponding to frequency of entity mentions. Infrequently mentioned entities are specific entities, while frequently mentioned entities are non-specific (generic or ambiguous) entities. The context surrounding mentions of the specific entities is processed to obtain interesting context terms (words, phrases or both) for the domain. The interesting context terms are then compared against the contexts of non-specific entity mentions to determine whether each non-specific entity mention is relevant to the domain. A result set containing only relevant documents or relevant mentions collection is output.
Public/Granted literature
- US20110302179A1 Using Context to Extract Entities from a Document Collection Public/Granted day:2011-12-08
Information query