Invention Grant
- Patent Title: Corpus quality analysis
-
Application No.: US14444690Application Date: 2014-07-28
-
Publication No.: US09754207B2Publication Date: 2017-09-05
- Inventor: Corville O. Allen , Andrew R. Freed , Richard A. Salmon , Beata J. Strack
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agent Stephen R. Tkacs; Stephen J. Walder, Jr.; Diana R. Gerhardt
- Main IPC: G06F17/30
- IPC: G06F17/30 ; G06F7/00 ; G06N5/02 ; G06N99/00 ; G06F17/27

Abstract:
A mechanism is provided in a data processing system for corpus quality analysis. The mechanism applies at least one filter to a candidate corpus to determine a degree to which the candidate corpus supplements existing corpora for performing a natural language processing (NLP) operation. Responsive to a determination to add the candidate corpus to the existing corpora based on a result of applying the at least one filter, the mechanism adds the candidate corpus to the existing corpora to form modified corpora. The mechanism performs the NLP operation using the modified corpora.
Public/Granted literature
- US20160026634A1 Corpus Quality Analysis Public/Granted day:2016-01-28
Information query