Personal Identifiable Information (PII) Anonymization

Personally Identifiable Information (PII) Anonymization

Business Results

Up to 63% faster inference of a BERT-based NER model

Up to 135% faster performance of a read_csv() API

Ability to process terabytes of data on a single workstation and scale from a single workstation to the cloud, using the same code, and focus more on data analysis and less on learning new APIs, with Intel® Distribution of Modin*

For workloads and configurations, see GitHub*. Results may vary.

View All Reference Kits

Background

Personally identifiable information (PII) is sensitive information that can be used to identify or locate an individual. Protecting PII in the data science world is important to maintain the privacy and security of individuals, and essential for compliance with data privacy laws and regulations. The increasing number of regulations is making it difficult to start building an application without knowing all the protocols to follow.

Several methods can be used to anonymize PII, including masking, hashing, and encryption and decryption. Each method has its own strengths and limitations. The appropriate method to use depends on the specific requirements and constraints of the dataset and the use case.

Data scientists can minimize privacy challenges in the design and development stage well before production. Speeding up the data pipeline, and extract, transform, and load (ETL) is critical to scale AI solutions.

Solution

In collaboration with Accenture*, Intel developed this AI data protection reference kit. This kit may help customers develop PII anonymization utility functions, which include methods for masking, hashing, and encrypting and decrypting the PII in large datasets (such as names, IP addresses, and phone numbers).

To anonymize data fields that contain names, a random-name-generator recurrent neural network (RNN) model is used in a pickled format, which generates realistic synthetic names. A pretrained BERT model is used for named-entity recognition (NER) in the free-flowing text. The identified entities are then masked using available obfuscation methods. This reference implementation considers the following tags for masking the PII datasets.

PER: Person name
LOC: Location name
ORG: Organization name

End-to-End Flow Using Intel® AI Software Products

This reference kit includes:

Training data
An open source, trained model
Libraries
User guides
Intel® AI software products

At a Glance

Industry: Cross-industry
Task: Mask PII using random generation functions for strings and numeric characters
Dataset: Random dataset generator script to produce random PII
Type of Learning: Deep learning
Models: Pretrained BERT model, RNN
Output: A .csv file with anonymized PII and a JSON file to aid the decryption process (optional, only in case of encryption)
Intel AI Software Products:
- Intel® Extension for PyTorch* v1.13.0
- Intel® Distribution for Python* (specifically the optimizations for NumPy and SciPy)
- Intel® Distribution of Modin*

Technology

Optimized with Intel AI Software Products for Better Performance

The AI structured data generation models were optimized by Intel Extension for PyTorch, Intel Distribution of Modin, and Intel Distribution for Python (specifically the optimizations for NumPy and SciPy).

Intel Extension for PyTorch, Intel Distribution of Modin, and Intel Distribution for Python allow you to reuse your model development code with minimal code changes for training and inferencing.

Performance benchmark tests were run on Microsoft Azure* Standard_D8_v5 using 3rd generation Intel® Xeon® processors to optimize the solution.

Benefits

Being able to protect PII in the data science world is important to maintain the privacy and security of individuals and essential for compliance with data privacy laws and regulations.

This reference kit provides PII anonymization utility functions, which include methods for masking, hashing, and encrypting and decrypting the PII in large datasets (such as names, IP addresses, and phone numbers).

With Intel® oneAPI toolkits, little to no code change is required to attain the performance boost.

Download Kit

Related Reference Kits

Network Intrusion Detection

Customer Segmentation

Purchase Prediction

Additional Resources

Intel AI Software Portfolio

AI Tools

All Intel AI Reference Kits

Stay Up to Date on AI Workload Optimizations

Sign up to receive hand-curated technical articles, tutorials, developer tools, training opportunities, and more to help you accelerate and optimize your end-to-end AI and data science workflows.

Take a chance and subscribe. You can change your mind at any time.

选择您的语言

使用 Intel.com 搜索

快速链接

最近搜索

高级搜索

仅搜索

Personally Identifiable Information (PII) Anonymization

Business Results

Background

Solution

End-to-End Flow Using Intel® AI Software Products

At a Glance

Technology

Benefits

Related Reference Kits

Additional Resources

Stay Up to Date on AI Workload Optimizations

You’re In!

无法提交您的表格。

表单提交失败

产品和性能信息

使用 Intel.com 搜索

快速链接

最近搜索

高级搜索

仅搜索

Personally Identifiable Information (PII) Anonymization

Business Results

Background

Solution

End-to-End Flow Using Intel® AI Software Products

At a Glance

Technology

Benefits

Related Reference Kits

Additional Resources

Stay Up to Date on AI Workload Optimizations

You’re In!

无法提交您的表格。

表单提交失败

您的注册无法继续。本网站上的材料受美国和其他适用的出口管制法律的约束，并非从所有位置都可以访问。

产品和性能信息