Machine learning modeling to identify sensitive data
Abstract:
Methods and systems identify and redact PII. A PII sensitivity detection framework includes multiple layers where each layer corresponds to a model. The framework analyzes data stored within different data tables and predicts whether a data column includes PII. The first layer corresponds to an AI model that analyzes each column metadata and predicts a first score indicative of a first likelihood of PII existence. The second layer corresponds to a rule-based model that uses various rules to determine a second score indicative of a second likelihood of PII existence for each column. The third layer corresponds to a column content model that analyzes content of each column using various natural language processing techniques to generate a third score indicative of a third likelihood of PII existence. The framework masks data presented to a user based on the scores generated via execution of one or more of the layers.
Public/Granted literature
Information query
Patent Agency Ranking
0/0