Invention Grant
- Patent Title: Method of establishing a plain text document from a HTML document
- Patent Title (中): 从HTML文档建立纯文本文档的方法
-
Application No.: US12628513Application Date: 2009-12-01
-
Publication No.: US08392820B2Publication Date: 2013-03-05
- Inventor: Hong-Yang Tsai , Chi-Hau Hung
- Applicant: Hong-Yang Tsai , Chi-Hau Hung
- Applicant Address: KY George Town, Grand
- Assignee: Esobi Inc.
- Current Assignee: Esobi Inc.
- Current Assignee Address: KY George Town, Grand
- Agency: Rosenberg, Klein & Lee
- Priority: TW97146687A 20081201
- Main IPC: G06F17/00
- IPC: G06F17/00

Abstract:
The present invention provides a method of establishing a plain text document from a HTML document. The method including the steps of (A) acquiring a HTML document defined by HTML elements, each composed of tags and content between the tags; (B) pre-processing the HTML document by omitting some of the tags (including the content between those tags), whereby the rest of the HTML document comprises at least one target tag (including content between the target tags); (C) using a data structure to store the remaining tags of the pre-processed HTML document; (D) grouping the remaining tags (including the content between the remaining tags) stored in the data structure of the pre-processed HTML document into at least one target group according to the target tag(s); and (E) identifying the target group(s) most related to a title of the HTML document by comparing correlation(s) between the target group(s) and the title, and establishing a plain text document having the content of the identified target group.
Public/Granted literature
- US20100146381A1 METHOD OF ESTABLISHING A PLAIN TEXT DOCUMENT FROM A HTML DOCUMENT Public/Granted day:2010-06-10
Information query