-
公开(公告)号:US11587347B2
公开(公告)日:2023-02-21
申请号:US17155077
申请日:2021-01-21
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Scott Carrier , Ritwik Ray , Jonathan Chapin Rand , Jothilakshmi Sirangimoorthy , Hui Wang , Robert Fredenburg
IPC: G06F17/00 , G06V30/412 , G06F3/0482 , G06F40/237 , G06F40/40 , G06V30/416
Abstract: Provided are a computer program product, system, and method for pre-processing a table in a document for natural language processing. A table in a document is parsed to extract column headers, row headers, and data cells, which are processed to determine an initial set of a main element comprising an entity whose value is to be extracted, a conditional element that refines the entity, and a value element comprising a value for the entity. A user selection is received of at least one of the column headers, row headers, and data cells for at least one of the main element, conditional element, and the value element in the initial set to produce a modified set of the main element, conditional element, and value element. The modified set is provided to a natural language processing engine to perform natural language processing of the document including the table, using the modified set.
-
公开(公告)号:US11423042B2
公开(公告)日:2022-08-23
申请号:US16785329
申请日:2020-02-07
Applicant: International Business Machines Corporation
Inventor: Jothilakshmi Sirangimoorthy , Ritwik Ray , Hui Wang , Jonathan Rand , Scott Carrier
IPC: G06F16/25 , G06F16/242 , G06F16/248 , G06F16/34 , G06F16/332 , G06F16/21
Abstract: Aspects of the present disclosure describe techniques for generating a machine learning model for extracting information from textual content. The method generally includes receiving a training data set including a plurality of documents having related textual strings. A relevancy model is generated from the training data set. The relevancy model is generally configured to generate relevance scores for a plurality of words extracted from the plurality of documents. A knowledge graph model illustrating relationships between the plurality of words extracted from the plurality of documents is generated from the training data set. The relevancy model and the knowledge graph model are aggregated into a complimentary model including a plurality of nodes from the knowledge graph model and weights associated with edges between connected nodes, wherein the weights comprise relevance scores generated from the relevancy model, and the complimentary model is deployed for use in analyzing documents.
-
3.
公开(公告)号:US11163837B2
公开(公告)日:2021-11-02
申请号:US16444129
申请日:2019-06-18
Applicant: International Business Machines Corporation
Inventor: Ritwik Ray , Marie Angelopoulos , Frederick Roberts , Christopher Gagen , Maria Gabrani
IPC: G06F40/20 , G06F16/93 , G06N5/00 , G06F16/245 , G06F16/248 , G06F40/169 , G06N20/00
Abstract: Methods and systems are provided to extract information within complex documents, and the extracted information may be compared to identify differences between complex documents or the extracted information may be analyzed with respect to the individual document. Information is extracted from complex documents comprising unstructured data to create a structured data repository, or analytics knowledge base. This database may be utilized to compare concepts that are common to one or more documents, allowing ease of comparison of documents, and identification of information that is different or identification of (same or similar) information that is presented differently in a set of complex documents.
-
4.
公开(公告)号:US11163836B2
公开(公告)日:2021-11-02
申请号:US15894109
申请日:2018-02-12
Applicant: International Business Machines Corporation
Inventor: Ritwik Ray , Marie Angelopoulos , Frederick Roberts , Christopher Gagen , Maria Gabrani
IPC: G06F40/20 , G06F16/93 , G06N5/00 , G06F16/245 , G06F16/248 , G06F40/169 , G06N20/00
Abstract: Methods and systems are provided to extract information within complex documents, and the extracted information may be compared to identify differences between complex documents or the extracted information may be analyzed with respect to the individual document. Information is extracted from complex documents comprising unstructured data to create a structured data repository, or analytics knowledge base. This database may be utilized to compare concepts that are common to one or more documents, allowing ease of comparison of documents, and identification of information that is different or identification of (same or similar) information that is presented differently in a set of complex documents.
-
公开(公告)号:US12112562B2
公开(公告)日:2024-10-08
申请号:US18518279
申请日:2023-11-22
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Scott Carrier , Ritwik Ray , Jonathan Chapin Rand , Jothilakshmi Sirangimoorthy , Hui Wang , Robert Fredenburg
IPC: G06F17/00 , G06F3/0482 , G06F40/237 , G06F40/40 , G06V30/412 , G06V30/416
CPC classification number: G06V30/412 , G06F3/0482 , G06F40/237 , G06F40/40 , G06V30/416
Abstract: Provided are a computer program product, system, and method for pre-processing a table in a document for natural language processing (NLP). A graphical user interface (GUI) provides a representation of table items in a table in a document including a set of a main element comprising an entity whose value is to be extracted, a conditional element that refines the entity, and a value element comprising a value for the entity. Graphical controls are rendered in the GUI to enable a user to select an element from the table to be the main element, conditional element, and value element. The set of the main element, conditional element, and value element are updated with the user selected element to form a modified set. The modified set of the main element, conditional element, and the value element are provided to an NLP engine to perform natural language processing.
-
公开(公告)号:US11869264B2
公开(公告)日:2024-01-09
申请号:US18154665
申请日:2023-01-13
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Scott Carrier , Ritwik Ray , Jonathan Chapin Rand , Jothilakshmi Sirangimoorthy , Hui Wang , Robert Fredenburg
IPC: G06F17/00 , G06V30/412 , G06F3/0482 , G06F40/237 , G06F40/40 , G06V30/416
CPC classification number: G06V30/412 , G06F3/0482 , G06F40/237 , G06F40/40 , G06V30/416
Abstract: Provided are a computer program product, system, and method for pre-processing a table in a document for natural language processing. A table in a document is parsed to extract column headers, row headers, and data cells, which are processed to determine an initial set of a main element comprising an entity whose value is to be extracted, a conditional element that refines the entity, and a value element comprising a value for the entity. A user selection is received of at least one of the column headers, row headers, and data cells for at least one of the main element, conditional element, and the value element in the initial set to produce a modified set of the main element, conditional element, and value element. The modified set is provided to a natural language processing engine to perform natural language processing of the document including the table, using the modified set.
-
7.
公开(公告)号:US20190303412A1
公开(公告)日:2019-10-03
申请号:US16444129
申请日:2019-06-18
Applicant: International Business Machines Corporation
Inventor: Ritwik Ray , Marie Angelopoulos , Frederick Roberts , Christopher Gagen , Maria Gabrani
IPC: G06F16/93 , G06F16/248 , G06N5/00 , G06F16/245
Abstract: Methods and systems are provided to extract information within complex documents, and the extracted information may be compared to identify differences between complex documents or the extracted information may be analyzed with respect to the individual document. Information is extracted from complex documents comprising unstructured data to create a structured data repository, or analytics knowledge base. This database may be utilized to compare concepts that are common to one or more documents, allowing ease of comparison of documents, and identification of information that is different or identification of (same or similar) information that is presented differently in a set of complex documents.
-
8.
公开(公告)号:US20190251182A1
公开(公告)日:2019-08-15
申请号:US15894109
申请日:2018-02-12
Applicant: International Business Machines Corporation
Inventor: Ritwik Ray , Marie Angelopoulos , Frederick Roberts , Christopher Gagen , Maria Gabrani
CPC classification number: G06F16/93 , G06F16/245 , G06F16/248 , G06N5/003 , G06N20/00
Abstract: Methods and systems are provided to extract information within complex documents, and the extracted information may be compared to identify differences between complex documents or the extracted information may be analyzed with respect to the individual document. Information is extracted from complex documents comprising unstructured data to create a structured data repository, or analytics knowledge base. This database may be utilized to compare concepts that are common to one or more documents, allowing ease of comparison of documents, and identification of information that is different or identification of (same or similar) information that is presented differently in a set of complex documents.
-
-
-
-
-
-
-