Invention Grant
- Patent Title: System and method for enrichment of OCR-extracted data
-
Application No.: US16446372Application Date: 2019-06-19
-
Publication No.: US11080563B2Publication Date: 2021-08-03
- Inventor: Shreyas Bettadapura Guruprasad , Radha Krishna Pisipati
- Applicant: Infosys Limited
- Applicant Address: IN Bangalore
- Assignee: Infosys Limited
- Current Assignee: Infosys Limited
- Current Assignee Address: IN Bangalore
- Agency: Troutman Pepper Hamilton Sanders LLP (Rochester)
- Priority: IN201841024115 20180628
- Main IPC: G06K9/62
- IPC: G06K9/62 ; G06N20/00

Abstract:
A computer implemented a method and system for enrichment of OCR extracted data is disclosed comprising of accepting a set of extraction criteria and a set of configuration parameters by a data extraction engine. The data extraction engine captures data satisfying an extraction criteria using the configuration parameters and adapts the captured data using a set of domain specific rules and a set of OCR error patterns. A learning engine generates learning data models using the adapted data and the configuration parameters and the system dynamically updates the extraction criteria using the generated learning data models. The extraction criteria comprise one or more extraction templates wherein an extraction template includes one of a regular expression, geometric markers, anchor text markers and a combination thereof.
Public/Granted literature
- US20200005089A1 SYSTEM AND METHOD FOR ENRICHMENT OF OCR-EXTRACTED DATA Public/Granted day:2020-01-02
Information query