Invention Grant
- Patent Title: Document clustering based on cohesive terms
- Patent Title (中): 基于内聚词的文档聚类
-
Application No.: US12058295Application Date: 2008-03-28
-
Publication No.: US07930282B2Publication Date: 2011-04-19
- Inventor: William S. Spangler
- Applicant: William S. Spangler
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Cantor Colburn LLP
- Main IPC: G06F7/00
- IPC: G06F7/00

Abstract:
A method and a storage medium, that includes instructions for causing a computer to implement the method, for document categorization is presented. The method includes identifying terms occurring in a collection of documents, and determining a cohesion score for each of the terms. The cohesion score is a function of a cosine difference between each of the documents containing the term and a centroid of all the documents containing the term. The method further includes sorting the terms based on the cohesion scores. The method also includes creating categories based on the cohesion scores of the terms, wherein each of the categories includes only documents (i) containing a selected one of the terms and (ii) that have not already been assigned to a category. The method still further includes moving each of the documents to a category of a nearest centroid, thereby refining the categories.
Public/Granted literature
- US20080177736A1 DOCUMENT CLUSTERING BASED ON COHESIVE TERMS Public/Granted day:2008-07-24
Information query