bfloat16 - Hardware Numerics Definition

Submitted: November 14, 2018 Last updated: November 14, 2018
  • File:
  • Size:
    0.24 MB

Detailed Description

Intel® Deep Learning Boost (Intel® DL Boost) uses bfloat16 format (BF16). This document describes the bfloat16 floating-point format.

BF16 has several advantages over FP16:

  • It can be seen as a short version of FP32, skipping the least significant 16 bits of mantissa.
  • There is no need to support denormals; FP32, and therefore also BF16, offer more than enough range for deep learning training tasks.
  • FP32 accumulation after the multiply is essential to achieve sufficient numerical behavior on an application level.
  • Hardware exception handling is not needed as this is a performance optimization; industry is designing algorithms around checking inf/NaN.

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804