-
公开(公告)号:US12046067B2
公开(公告)日:2024-07-23
申请号:US17348433
申请日:2021-06-15
Applicant: Dathena Science Pte. Ltd.
Inventor: Christopher Muffat , Tetiana Kodliuk
IPC: G06V30/416 , G06F18/22 , G06N3/045 , G06N3/08 , G06V10/46 , G06V30/10 , G06V30/164 , G06V30/28
CPC classification number: G06V30/416 , G06F18/22 , G06N3/045 , G06N3/08 , G06V10/462 , G06V30/10 , G06V30/164 , G06V30/293
Abstract: Methods and systems for extracting personal data from a sensitive document are provided. The system includes a document prediction module, a cropping module, a denoising module, and an optical character recognition (OCR) module. The document prediction module predicts type of document of the sensitive document using a keypoint matching-based approach and the cropping module extracts document shape and extracts one or more fields comprising text or pictures from the sensitive document. The denoising module prepares the one or more fields for optical character recognition, and the OCR module performs optical character recognition on the denoised one or more fields to detect characters in the one or more fields.
-
公开(公告)号:US20200279105A1
公开(公告)日:2020-09-03
申请号:US16731259
申请日:2019-12-31
Applicant: Dathena Science Pte Ltd
Inventor: Christopher MUFFAT , Tetiana KODLIUK
Abstract: Methods, systems and deep learning engines for content and context aware data classification by business category and confidentiality level are provided. The deep learning engine includes a feature extraction module and a classification and labelling module. The feature extraction module extracts both context features and document features from documents and the classification and labelling module is configured for content and context aware data classification of the documents by business category and confidentiality level using neural networks.
-
公开(公告)号:US20200250139A1
公开(公告)日:2020-08-06
申请号:US16731351
申请日:2019-12-31
Applicant: Dathena Science Pte Ltd
Inventor: Christopher MUFFAT , Tetiana KODLIUK
Abstract: Systems and methods for personal data classification, linkage and purpose of processing prediction are provided. The system for personal data classification includes an entity extraction module for extracting personal data from one or more data repositories in a computer network or cloud infrastructure, a linkage module coupled to the entity extraction module, a linkage module coupled to the entity extraction module and a processing prediction module. The entity extraction module performs entity recognition from the structured, semi-structured and unstructured records in the one or more data repositories. The linkage module uses graph-based methodology to link the personal data to one or more individuals. And the purpose prediction module includes a feature extraction module a purpose of processing prediction module, wherein the feature extraction module extracts both context features and record's features from records in the one or more data repositories, and the purpose of processing prediction module predicts a unique or multiple purpose of processing of the personal data.
-
公开(公告)号:US20210374533A1
公开(公告)日:2021-12-02
申请号:US17331938
申请日:2021-05-27
Applicant: Dathena Science Pte. Ltd.
Inventor: Christopher MUFFAT , Tetiana KODLIUK , Adel RAHIMI
Abstract: Methods, systems and computer readable medium for explainable artificial intelligence are provided. The method for explainable artificial intelligence includes receiving a document and pre-processing the document to prepare information in the document for processing. The method further includes processing the information by an artificial neural network for one or more tasks. In addition, the method includes providing explanations and visualization of the processing by the artificial neural network to a user during processing of the information by the artificial neural network.
-
5.
公开(公告)号:US20200250241A1
公开(公告)日:2020-08-06
申请号:US16730111
申请日:2019-12-30
Applicant: Dathena Science Pte Ltd
Inventor: Christopher Muffat , Tetiana Kodliuk
Abstract: Methods and systems for data management of documents in one or more data repositories in a computer network or cloud infrastructure are provided. The method includes sampling the documents in the one or more data repositories and formulating representative subsets of the sampled documents. The method further includes generating sampled data sets of the sampled documents and balancing the sampled data sets for further processing of the sampled documents. The formulation of the representative subsets is performed for identification of some of the representative subsets for initial processing.
-
公开(公告)号:US11461371B2
公开(公告)日:2022-10-04
申请号:US16731356
申请日:2019-12-31
Applicant: Dathena Science Pte Ltd
Inventor: Christopher Muffat , Tetiana Kodliuk
Abstract: Methods and systems for data loss prevention and autolabelling of business categories and confidentiality based on text summarization are provided. The method for data loss prevention includes entering a combination of keywords and/or keyphrases and offline unsupervised mapping of a path of transfer of specific groups of documents. The offline unsupervised mapping includes keyword/keyphrase extraction from the specific groups of documents and normalization of candidates. The method further includes vectorization of the extracted keywords/keyphrases from the specific groups of documents and quantitative performance measurement of the keyword/keyphrase extraction to derive keywords and/or keyphrases suitable for data loss prevention.
-
公开(公告)号:US20200226154A1
公开(公告)日:2020-07-16
申请号:US16731356
申请日:2019-12-31
Applicant: Dathena Science Pte Ltd
Inventor: Christopher Muffat , Tetiana Kodliuk
IPC: G06F16/28 , G06F16/93 , G06N20/00 , G06F16/242 , G06N5/04
Abstract: Methods and systems for data loss prevention and autolabelling of business categories and confidentiality based on text summarization are provided. The method for data loss prevention includes entering a combination of keywords and/or keyphrases and offline unsupervised mapping of a path of transfer of specific groups of documents. The offline unsupervised mapping includes keyword/keyphrase extraction from the specific groups of documents and normalization of candidates. The method further includes vectorization of the extracted keywords/keyphrases from the specific groups of documents and quantitative performance measurement of the keyword/keyphrase extraction to derive keywords and/or keyphrases suitable for data loss prevention.
-
公开(公告)号:US12039074B2
公开(公告)日:2024-07-16
申请号:US16731351
申请日:2019-12-31
Applicant: Dathena Science Pte Ltd
Inventor: Christopher Muffat , Tetiana Kodliuk
IPC: G06F16/182 , G06F16/14 , G06F16/16 , G06F18/21 , G06F21/62 , G06N20/00 , G06V10/82 , G06V30/196 , G06V30/262 , G06V30/412
CPC classification number: G06F21/6245 , G06F16/148 , G06F16/156 , G06F16/164 , G06F16/182 , G06F18/2185 , G06N20/00 , G06V10/82 , G06V30/1988 , G06V30/274 , G06V30/412
Abstract: Systems and methods for personal data classification, linkage and purpose of processing prediction are provided. The system for personal data classification includes an entity extraction module for extracting personal data from one or more data repositories in a computer network or cloud infrastructure, a linkage module coupled to the entity extraction module, a linkage module coupled to the entity extraction module and a processing prediction module. The entity extraction module performs entity recognition from the structured, semi-structured and unstructured records in the one or more data repositories. The linkage module uses graph-based methodology to link the personal data to one or more individuals. And the purpose prediction module includes a feature extraction module a purpose of processing prediction module, wherein the feature extraction module extracts both context features and record's features from records in the one or more data repositories, and the purpose of processing prediction module predicts a unique or multiple purpose of processing of the personal data.
-
公开(公告)号:US12033040B2
公开(公告)日:2024-07-09
申请号:US17268381
申请日:2018-08-14
Applicant: Dathena Science Pte. Ltd.
Inventor: Christopher Muffat
IPC: G06N20/00 , G06F18/23213 , G06F40/284 , G06F40/30
CPC classification number: G06N20/00 , G06F18/23213 , G06F40/284 , G06F40/30
Abstract: Systems, methods and computer readable medium are provided for perform a method for content and context aware data classification or a method for content and context aware data security anomaly detection. The method for content and context aware data confidentiality classification includes scanning one or more documents in one or more network data repositories of a computer network and extracting content features and context features of the one or more documents into one or more term frequency-inverse document frequency (TF-IDF) vectors and one or more latent semantic indexing (LSI) vectors. The method further includes classifying the one or more documents into a number of category classifications by machine learning the extracted content features and context features of the one or more documents at a file management platform of the computer network, each of the category classifications being associated with one or more confidentiality classifications.
-
10.
公开(公告)号:US11675926B2
公开(公告)日:2023-06-13
申请号:US16730111
申请日:2019-12-30
Applicant: Dathena Science Pte Ltd
Inventor: Christopher Muffat , Tetiana Kodliuk
IPC: G06F16/93 , G06F21/62 , G06F16/9035 , G06N20/00 , G06F16/906 , G06F18/23213
CPC classification number: G06F21/6245 , G06F16/906 , G06F16/9035 , G06F16/93 , G06F18/23213 , G06N20/00
Abstract: Methods and systems for data management of documents in one or more data repositories in a computer network or cloud infrastructure are provided. The method includes sampling the documents in the one or more data repositories and formulating representative subsets of the sampled documents. The method further includes generating sampled data sets of the sampled documents and balancing the sampled data sets for further processing of the sampled documents. The formulation of the representative subsets is performed for identification of some of the representative subsets for initial processing.
-
-
-
-
-
-
-
-
-