Invention Grant
US09589183B2 System and method for identification and extraction of data 有权
用于识别和提取数据的系统和方法

System and method for identification and extraction of data
Abstract:
A system and method of for describing target data as a sequence of pattern elements and pattern element groups that comprise an overall target pattern is described. Pattern elements may utilize regular expression syntax along with other metadata that describe the behavior of the element. A pattern element group may be a collection of fully defined pattern elements where at least one pattern element from the group must have a match for the overall pattern to match. Patterns contain both pattern elements and pattern element groups. The general process involves first performing optical character recognition (OCR) on the document, which in turn produces a sequence of text tokens representing the lines of text on each page of the document. The search algorithm may then apply each defined pattern to the entire document capturing and/or extracting data that match each pattern's required elements and element groups.
Public/Granted literature
Information query
Patent Agency Ranking
0/0