Invention Grant
US08352469B2 Automatic generation of stop word lists for information retrieval and analysis
有权
自动生成用于信息检索和分析的停止词列表
- Patent Title: Automatic generation of stop word lists for information retrieval and analysis
- Patent Title (中): 自动生成用于信息检索和分析的停止词列表
-
Application No.: US12555962Application Date: 2009-09-09
-
Publication No.: US08352469B2Publication Date: 2013-01-08
- Inventor: Stuart J Rose
- Applicant: Stuart J Rose
- Applicant Address: US WA Richland
- Assignee: Battelle Memorial Institute
- Current Assignee: Battelle Memorial Institute
- Current Assignee Address: US WA Richland
- Agent Allan C. Tuan
- Main IPC: G06F7/00
- IPC: G06F7/00 ; G06F17/30

Abstract:
Methods and systems for automatically generating lists of stop words for information retrieval and analysis. Generation of the stop words can include providing a corpus of documents and a plurality of keywords. From the corpus of documents, a term list of all terms is constructed and both a keyword adjacency frequency and a keyword frequency are determined. If a ratio of the keyword adjacency frequency to the keyword frequency for a particular term on the term list is less than a predetermined value, then that term is excluded from the term list. The resulting term list is truncated based on predetermined criteria to form a stop word list.
Public/Granted literature
- US20110004610A1 Automatic Generation of Stop Word Lists for Information Retrieval and Analysis Public/Granted day:2011-01-06
Information query