Client Story: Protecting Personal Identifiable Information (PII) using AI/ML
3 min Read
The financial services industry operates and deals with vast amount of confidential client and customer data for daily business transactions. Due to the perceived value of this data, the financial services industry is one of the primary targets for data breaches. In fact, one of the key determinants of the success or failure of a financial service firm is - how well it balances data sharing flexibility and data privacy.

Our renowned client- a wealth management institution, sought a swift and automated data redaction solution to meet their business requirements. Along with being able to handle customers’ Personable information competently, they also wanted to upgrade their data privacy and lineage culture by imbibing the AI/ML solution into their process.


Having studied our client’s business landscape and the challenges involved, we built a groundbreaking AI/ML solution for Redaction that helped WM institutions extract vital information from identity documents with unmatched accuracy.

We employed computer vision, machine learning and rule-based capabilities to extract and redact entities according to the specifics. No more erroreneous tasks such as drawing boxes or changing font colors that can expose sensitive data to search engine indexing – we employ advanced algorithms that ensures precise redaction every time.

Our strategy of data redaction involved custom-built solutions that addressed their several use cases. However the basic process outlines as below:

Data Collection: The first step is to collect the relevant data that needs to be redacted. This may include documents, emails, databases, or any other sources containing sensitive information.

Text Extraction: Once the data is collected, the next step is to extract the text from the different sources. This can involve converting documents into machine-readable formats or extracting text from structured databases.

Processing: The extracted text is then processed using various techniques to identify and redact sensitive information. Custom Regular Expression (REGEX) patterns can be used to search for specific patterns or formats of sensitive data like social security numbers or credit card numbers. Named Entity Recognition (NER) algorithms can identify entities like names, addresses, or medical terms. Rule-based logic and deny lists can be applied to identify and redact specific keywords or phrases.

Algorithm (BERT, Fine-tune BERT): Advanced algorithms like Bidirectional Encoder Representations from Transformers (BERT) can be utilized for more complex redaction tasks. BERT-based models can be fine-tuned on specific datasets to improve the accuracy and effectiveness of redaction, especially in cases where context and semantics play a crucial role.

Output: The final step is to generate the redacted output. This can be in the form of updated documents, databases, or reports where the sensitive information has been replaced or masked. The output should ensure that the redacted data is effectively removed or obfuscated, while maintaining the integrity and structure of the original information.

When it comes to redacting images, we use an advanced AI/ML technology to detect text lists within the image. Our solution then creates a bounding box directly on the pixel data, runs it through our algorithm, and produces a masked image as the result.

Text Redaction
Image Redaction
Entity extraction

We leveraged NER algorithms to identify and extract named entities, such as people, organizations, and locations, from the text data. It can be used for fraud detection and trend analysis.

