Invention Grant
- Patent Title: Information retrieval systems with duplicate document detection and presentation functions
- Patent Title (中): 具有重复文件检测和呈现功能的信息检索系统
-
Application No.: US11122577Application Date: 2005-05-05
-
Publication No.: US07809695B2Publication Date: 2010-10-05
- Inventor: Jack G. Conrad , Joanne R. S. Claussen , Jie Lin
- Applicant: Jack G. Conrad , Joanne R. S. Claussen , Jie Lin
- Applicant Address: CH Baar
- Assignee: Thomson Reuters Global Resources
- Current Assignee: Thomson Reuters Global Resources
- Current Assignee Address: CH Baar
- Agency: Valenti, Hanley & Robinson, PLLC
- Agent Kevin T. Duncan
- Main IPC: G06F7/00
- IPC: G06F7/00

Abstract:
Many companies provide online search facilities that enable users to conduct computerized searches for documents. Unfortunately, these searches frequently provide results that include duplicate documents—that is, documents that are completely or substantially identical to each other. This problem is particularly vexing when searching news stories, for example. Moreover, the duplicate documents are intermixed in the search results, leaving users to manually manage the complexities of identifying and/or filtering them. Accordingly, the present inventors devised systems, methods, and software that facilitate the identification and/or grouping of duplicate documents in search results. One exemplary system includes a signature generation module which generates document signatures based on length, temporal, and/or content components; a real-time duplicate detection module which uses the document signatures to identify “exact” or “fuzzy” duplicate documents; and a user-interface or presentation module which controls how duplicate documents are presented or suppressed in search results.
Public/Granted literature
- US20060041597A1 Information retrieval systems with duplicate document detection and presentation functions Public/Granted day:2006-02-23
Information query