Invention Grant
- Patent Title: System and method for adaptive sentence boundary disambiguation
- Patent Title (中): 自适应句边界消歧的系统和方法
-
Application No.: US11965934Application Date: 2007-12-28
-
Publication No.: US08131546B1Publication Date: 2012-03-06
- Inventor: Keith Zoellner
- Applicant: Keith Zoellner
- Applicant Address: US TX Austin
- Assignee: Stored IQ, Inc.
- Current Assignee: Stored IQ, Inc.
- Current Assignee Address: US TX Austin
- Agency: Sprinkle IP Law Group
- Main IPC: G10L17/00
- IPC: G10L17/00

Abstract:
Embodiments disclosed herein provide a system and method useful for pre-processing non-sentence text extracted from business documents (e.g., malformed bulleted lists, runaway sentence identification, spatially separated data, etc.). One embodiment includes two heuristic algorithms: one searches for sentences in a document and another looks for non-sentences (e.g., lists, tables, tabs, names of people, addresses, etc.) in the same document. In one embodiment, when malformed text is encountered, a particular character (e.g., “?”) is inserted to signify to a natural language processing layer that this set of “words” represent a logical construct and should be evaluated independent of other sentences. Embodiments disclosed herein allow non-sentence text, which is linguistically dry but contextually rich, be included in the natural language processing. Embodiments disclosed herein also facilitate to reduce false positive concept extraction assertions by the natural language processing layer.
Information query