Invention Grant
US08645819B2 Detection and extraction of elements constituting images in unstructured document files
有权
在非结构化文档文件中检测和提取构成图像的元素
- Patent Title: Detection and extraction of elements constituting images in unstructured document files
- Patent Title (中): 在非结构化文档文件中检测和提取构成图像的元素
-
Application No.: US13162858Application Date: 2011-06-17
-
Publication No.: US08645819B2Publication Date: 2014-02-04
- Inventor: Hervé Déjean
- Applicant: Hervé Déjean
- Applicant Address: US CT Norwalk
- Assignee: Xerox Corporation
- Current Assignee: Xerox Corporation
- Current Assignee Address: US CT Norwalk
- Agency: Fay Sharpe LLP
- Main IPC: G06F17/00
- IPC: G06F17/00

Abstract:
A method and a system for detecting and extracting images in an electronic document are disclosed. The method includes receiving an electronic document and identifying elements of a page. The identified elements include a set of graphical elements and a set of text elements. The method may include identifying and excluding elements which serve as graphical page constructs and/or text formatting elements. The page can then be segmented, based on (remaining) graphical elements and identified white spaces, to generate a set of image blocks. Text elements that are associated with a respective image block are identified as captions. Overlapping candidate images are then grouped to form a new image. The new image can thus include candidate images which would, without the identification of their caption(s), each be treated as a respective image.
Public/Granted literature
- US20120324341A1 DETECTION AND EXTRACTION OF ELEMENTS CONSTITUTING IMAGES IN UNSTRUCTURED DOCUMENT FILES Public/Granted day:2012-12-20
Information query