Processing system using natural language processing for performing dataset filtering and sanitization
Abstract:
Aspects of the disclosure relate to processing systems using natural language processing with improved dataset filtering and sanitization techniques. A computing platform may receive a dataset file and commands directing the computing platform to sanitize the dataset file. In response to the commands, the computing platform may identify confidential information contained in the dataset file using named entity recognition and one or more dynamic entity profiles, extract the confidential information, and replace the confidential information with non-confidential information to produce a sanitized dataset file. Based on identifying the confidential information contained in the dataset file, the computing platform may update the dynamic entity profiles. The computing platform may send the sanitized dataset file to the target environment host server, causing the target environment host server to use the sanitized dataset file in a testing environment that is prohibited from containing confidential information.
Information query
Patent Agency Ranking
0/0