Invention Grant
US08645819B2 Detection and extraction of elements constituting images in unstructured document files 有权
在非结构化文档文件中检测和提取构成图像的元素

  • Patent Title: Detection and extraction of elements constituting images in unstructured document files
  • Patent Title (中): 在非结构化文档文件中检测和提取构成图像的元素
  • Application No.: US13162858
    Application Date: 2011-06-17
  • Publication No.: US08645819B2
    Publication Date: 2014-02-04
  • Inventor: Hervé Déjean
  • Applicant: Hervé Déjean
  • Applicant Address: US CT Norwalk
  • Assignee: Xerox Corporation
  • Current Assignee: Xerox Corporation
  • Current Assignee Address: US CT Norwalk
  • Agency: Fay Sharpe LLP
  • Main IPC: G06F17/00
  • IPC: G06F17/00
Detection and extraction of elements constituting images in unstructured document files
Abstract:
A method and a system for detecting and extracting images in an electronic document are disclosed. The method includes receiving an electronic document and identifying elements of a page. The identified elements include a set of graphical elements and a set of text elements. The method may include identifying and excluding elements which serve as graphical page constructs and/or text formatting elements. The page can then be segmented, based on (remaining) graphical elements and identified white spaces, to generate a set of image blocks. Text elements that are associated with a respective image block are identified as captions. Overlapping candidate images are then grouped to form a new image. The new image can thus include candidate images which would, without the identification of their caption(s), each be treated as a respective image.
Information query
Patent Agency Ranking
0/0