Model Quantization for Production with

Intel® Deep Learning Boost


Inference with integer 8 bits (INT8) quantization can improve the performance of your deep learning model at production.

Model Quantization

Most deep learning models are built using 32 bits floating-point precision (FP32). Quantization is the process to represent the model using less memory with minimal accuracy loss. In this context, the main focus is the representation in INT8.

arena render


What is Intel® Deep Learning Boost (Intel® DL Boost)?

The second generation of Intel® Xeon® Scalable processors introduced a collection of features for deep learning, packaged together as Intel® Deep Learning Boost. These features include Vector Neural Network Instructions (VNNI), which increases throughput for inference applications with support for INT8 convolutions by combining multiple machine instructions from previous generations into one machine instruction.

Learn More

First MLPerf Inference Results

Technical Description of VNNI

Frameworks and Tools

These frameworks and tools include support for Intel DL Boost on second generation Intel® Xeon® Scalable processors.


Customer Use Cases