Invention Grant
- Patent Title: Text mining for automatically determining semantic relatedness
-
Application No.: US15418744Application Date: 2017-01-29
-
Publication No.: US10169331B2Publication Date: 2019-01-01
- Inventor: Kamila Baron-Palucka , Lukasz G. Cmielowski , Marek J. Oszajec , Pawel Slowikowski
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee Address: US NY Armonk
- Agency: Lee Law, PLLC
- Agent Christopher B. Lee
- Main IPC: G06F17/27
- IPC: G06F17/27 ; H04L12/58 ; H04L29/06

Abstract:
Described herein is an approach for automatically determining the semantic relatedness of documents to semantic concepts. A first text mining analysis extracts a set of reference concepts from reference documents. A second text mining analysis extracts a set of test concepts from test documents that include a mixture of new concepts and reference concepts. An extended co-occurrence matrix is computed that indicates a frequency of co-occurrence (RCCF) of each new and each reference concept in the test documents with all other new and reference concepts. The extended co-occurrence matrix is used for computing a new concept relatedness score (NCRS) for the new concepts. A document similarity score (DSS) is computed for each of the test documents by aggregating, inter alia, the NCRS of each new concept with the RCCF of each reference concept. The DSS represents the semantic relatedness of the test document to the totality of the reference concepts.
Public/Granted literature
- US20180217980A1 TEXT MINING FOR AUTOMATICALLY DETERMINING SEMANTIC RELATEDNESS Public/Granted day:2018-08-02
Information query