Invention Grant
- Patent Title: Semantic similarity based document retrieval
- Patent Title (中): 基于语义相似度的文档检索
-
Application No.: US10573152Application Date: 2004-09-22
-
Publication No.: US07644047B2Publication Date: 2010-01-05
- Inventor: Behrad Assadian , Behnam Azvine , Trevor P Martin
- Applicant: Behrad Assadian , Behnam Azvine , Trevor P Martin
- Applicant Address: GB London
- Assignee: British Telecommunications public limited company
- Current Assignee: British Telecommunications public limited company
- Current Assignee Address: GB London
- Agency: Nixon & Vanderhye P.C.
- Priority: GB0322899.6 20030930; GB0328890.9 20031212
- International Application: PCT/GB2004/004028 WO 20040922
- International Announcement: WO2005/041063 WO 20050506
- Main IPC: G06F15/18
- IPC: G06F15/18 ; G06F7/00

Abstract:
A method and apparatus are provided for generating, from an input set of documents, a word replaceability matrix defining semantic similarity between words occurring in the input document set. For each word, distinct word sequences of predetermined length are identified from the documents of the set, each word sequence being indicative of the context in which the word was used and, according to the relative frequency of occurrence of the identified word sequences for the word, fuzzy sets are generated for each word comprising membership values for corresponding groups of word sequences. For each pair of words occurring in the document set, their respective fuzzy sets are used to calculate the probability that the first word of a pair is semantically suitable as a replacement for the second word of the pair, these probabilities being collated to form a word similarity matrix for use in an improved method of determining document similarity and in information retrieval.
Public/Granted literature
- US20070016571A1 Information retrieval Public/Granted day:2007-01-18
Information query