Invention Grant
- Patent Title: Searching multilingual documents based on document structure extraction
-
Application No.: US16866646Application Date: 2020-05-05
-
Publication No.: US11222053B2Publication Date: 2022-01-11
- Inventor: Xin Tang , Kun Yan Yin , He Li , Xueliang Zhao , Xin Xu
- Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Applicant Address: US NY Armonk
- Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee Address: US NY Armonk
- Agency: Schmeiser, Olsen & Watts
- Agent Nicholas L. Cadmus
- Main IPC: G06F16/33
- IPC: G06F16/33 ; G06F16/93 ; G06F16/338 ; G06F16/35 ; G06F16/9535 ; G06F40/40 ; G06F40/58

Abstract:
An approach is provided for searching multilingual documents. A first classification is determined that includes a first document and other document(s) by minimizing a first distance between a first numerical fixed length vector for the first document and other numerical fixed length vector(s) for other document(s). Based on a query and a natural language detected in the query, a second document is selected. A second stream modeling the second document is encoded as a second numerical fixed length vector. Based on a distance between the first and second numerical fixed length vectors being less than a threshold, the first classification is identified as including the second document. Documents in the first classification are ranked and presented as having content matching the second document's content. At least one of the ranked documents is expressed in a natural language different from the natural language of the second document.
Public/Granted literature
- US20200265074A1 SEARCHING MULTILINGUAL DOCUMENTS BASED ON DOCUMENT STRUCTURE EXTRACTION Public/Granted day:2020-08-20
Information query