Invention Grant
- Patent Title: Interactive cleaning for automatic document clustering and categorization
- Patent Title (中): 自动文档聚类和分类的交互式清理
-
Application No.: US11784321Application Date: 2007-04-06
-
Publication No.: US07711747B2Publication Date: 2010-05-04
- Inventor: Jean-Michel Renders , Caroline Privault , Ludovic Menuge
- Applicant: Jean-Michel Renders , Caroline Privault , Ludovic Menuge
- Applicant Address: US CT Norwalk
- Assignee: Xerox Corporation
- Current Assignee: Xerox Corporation
- Current Assignee Address: US CT Norwalk
- Agency: Fay Sharpe LLP
- Main IPC: G06F17/30
- IPC: G06F17/30

Abstract:
Documents are clustered or categorized to generate a model associating documents with classes. Outlier measures are computed for the documents indicative of how well each document fits into the model. Outlier documents are identified to a user based on the outlier measures and a user selected outlier criterion. Ambiguity measures are computed for the documents indicative of a number of classes with which each document has similarity under the model. If a document is annotated with a label class, a possible corrective label class is identified if the annotated document has higher similarity with the possible corrective label class under the model than with the annotated label class. The clustering or categorizing is repeated adjusted based on received user input to generate an updated model associating documents with classes. Outlier and ambiguity measures are also calculated at runtime for new documents classified using the model.
Public/Granted literature
- US20080249999A1 Interactive cleaning for automatic document clustering and categorization Public/Granted day:2008-10-09
Information query