System and method for enrichment of OCR-extracted data
Abstract:
A computer implemented a method and system for enrichment of OCR extracted data is disclosed comprising of accepting a set of extraction criteria and a set of configuration parameters by a data extraction engine. The data extraction engine captures data satisfying an extraction criteria using the configuration parameters and adapts the captured data using a set of domain specific rules and a set of OCR error patterns. A learning engine generates learning data models using the adapted data and the configuration parameters and the system dynamically updates the extraction criteria using the generated learning data models. The extraction criteria comprise one or more extraction templates wherein an extraction template includes one of a regular expression, geometric markers, anchor text markers and a combination thereof.
Public/Granted literature
Information query
Patent Agency Ranking
0/0