Invention Grant
US08170289B1 Hierarchical alignment of character sequences representing text of same source
有权
表示相同来源的文本的字符序列的分层对齐
- Patent Title: Hierarchical alignment of character sequences representing text of same source
- Patent Title (中): 表示相同来源的文本的字符序列的分层对齐
-
Application No.: US11232476Application Date: 2005-09-21
-
Publication No.: US08170289B1Publication Date: 2012-05-01
- Inventor: Shaolei Feng , Raghavan Manmatha
- Applicant: Shaolei Feng , Raghavan Manmatha
- Applicant Address: US CA Mountain View
- Assignee: Google Inc.
- Current Assignee: Google Inc.
- Current Assignee Address: US CA Mountain View
- Agency: Fish & Richardson P.C.
- Main IPC: G06K9/00
- IPC: G06K9/00

Abstract:
Systems and methods for character-by-character alignment of two character sequences (such as OCR output from a scanned document and an electronic version of the same document) using a Hidden Markov Model (HMM) in a hierarchical fashion are disclosed. The method may include aligning two character sequences utilizing multiple hierarchical levels. For each hierarchical level above a final hierarchical level, the aligning may include parsing character subsequences from the two character sequences, performing an alignment of the character subsequences, and designating aligned character subsequences as the anchors, the parsing and performing the alignment being between the anchors generated from an immediately previous hierarchical level if the current hierarchical level is below the first hierarchical level. For the final hierarchical level, the aligning includes performing a character-by-character alignment of characters between anchors generated from the immediately previous hierarchical level. At each hierarchical level, an HMM may be constructed and Viterbi algorithm may be employed to solve for the alignment.
Information query