Automating creation of accurate OCR training data using specialized UI application
Abstract:
Systems of the present disclosure generate accurate training data for optical character recognition (OCR). Systems disclosed herein generates images of a text passage as displayed piecemeal in a user interface (UI) element rendered in a selected font type and size, determine accurate dimensions and locations of bounding boxes for each character pictured in the images, stitch together a training image by concatenating the images, and associate the training image, the bounding box dimensions and locations, and the text passage together in a collection of training data. The collection of training data also includes a computer-readable master copy of the text passage with newline characters inserted therein.
Information query
Patent Agency Ranking
0/0