Invention Grant
- Patent Title: Electronic document source ingestion for natural language processing systems
- Patent Title (中): 用于自然语言处理系统的电子文件源摄取
-
Application No.: US13711788Application Date: 2012-12-12
-
Publication No.: US09053086B2Publication Date: 2015-06-09
- Inventor: Joel C. Dubbels
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Patterson & Sheridan, LLP
- Main IPC: G06F17/27
- IPC: G06F17/27 ; G06F17/22

Abstract:
The data store for a natural-language computing system may include information that originates from a plurality of different data sources—e.g., journals, websites, magazines, reference books, and the like. In one embodiment, the information or text from the data sources are converted into a single, shared format and stored as objects in a data store. In order to ingest the different documents with their respective formats, a natural language processing system may perform preprocessing to change the different formats into a normalized format. When a new text document is received, the text may be correlated to a particular properties file which includes instructions specifying how the preprocessor should interpret the received text. Based on these instructions, a preprocessor identifies relevant portions of the text document and assigns these portions to formatting elements in the normalized format. The text may then be stored in the objects based on this assignment.
Public/Granted literature
- US20140164408A1 ELECTRONIC DOCUMENT SOURCE INGESTION FOR NATURAL LANGUAGE PROCESSING SYSTEMS Public/Granted day:2014-06-12
Information query