Searching multilingual documents based on document structure extraction
Abstract:
An approach is provided for searching multilingual documents. A first classification is determined that includes a first document and other document(s) by minimizing a first distance between a first numerical fixed length vector for the first document and other numerical fixed length vector(s) for other document(s). Based on a query and a natural language detected in the query, a second document is selected. A second stream modeling the second document is encoded as a second numerical fixed length vector. Based on a distance between the first and second numerical fixed length vectors being less than a threshold, the first classification is identified as including the second document. Documents in the first classification are ranked and presented as having content matching the second document's content. At least one of the ranked documents is expressed in a natural language different from the natural language of the second document.
Information query
Patent Agency Ranking
0/0