Intel® Optimization for TensorFlow* Installation Guide

By PREETHI VENKATESH, Jing Xu

Published:08/09/2017   Last Updated:04/03/2020

TensorFlow* is a widely-used machine learning framework in the deep learning arena, demanding efficient utilization of computational resources. In order to take full advantage of Intel® architecture and to extract maximum performance, the TensorFlow framework has been optimized using oneAPI Deep Neural Network Library (oneDNN) primitives, a popular performance library for deep learning applications. For more information on the optimizations as well as performance data, see this blog post TensorFlow* Optimizations on Modern Intel® Architecture .

Anaconda* has now made it convenient for the AI community to enable high-performance-computing in TensorFlow. Starting from TensorFlow v1.9, Anaconda has and will continue to build TensorFlow using oneDNN primitives to deliver maximum performance in your CPU.

This install guide features several methods to obtain Intel Optimized TensorFlow including off-the-shelf packages or building one from source that are conveniently categorized into Binaries, Docker Images, Build from Source.

Now, Intel-optimized Tensorflow is also available as part of Intel® AI Analytics Toolkit. Download and Install to get separate condo environments optimized with Intel's latest AI accelerations. Code samples to help get started with are available at: https://github.com/intel/AiKit-code-samples

Quick Links

Anaconda

*Supports Py36 and Py37

PIP Wheels

Docker Containers

Build from source

1. Binaries

Install the latest Intel® Optimization for TensorFlow* from Anaconda* Cloud

Available for Linux*, Windows*, MacOS*

TensorFlow* version: 2.2.0

Installation instructions:

If you don't have conda package manager, download and install Anaconda

Linux and MacOS

Open Anaconda prompt and use the following instruction

conda  install tensorflow

In case your anaconda channel is not the highest priority channel by default(or you are not sure), use the following command to make sure you get the right TensorFlow with Intel optimizations

conda  install tensorflow -c anaconda

Windows

Open Anaconda prompt and use the following instruction

conda install tensorflow-mkl

(or)

conda install tensorflow-mkl -c anaconda

Besides the install method described above, Intel Optimization for TensorFlow is distributed as wheels, docker images and conda package on Intel channel. Follow one of the installation procedures to get Intel-optimized TensorFlow.

 

Note: All binaries distributed by Intel were built against the TensorFlow v2.2.0 tag in a centOS container with gcc 4.8.5 and glibc 2.17 with the following compiler flags (shown below as passed to bazel*)

--cxxopt=-D_GLIBCXX_USE_CXX11_ABI=0 --copt=-march=corei7-avx --copt=-mtune=core-avx-i --copt=-O3 --copt=-Wformat --copt=-Wformat-security --copt=-fstack-protector --copt=-fPIC --copt=-fpic --linkopt=-znoexecstack --linkopt=-zrelro --linkopt=-znow --linkopt=-fstack-protector

Install the latest Intel® Optimization for TensorFlow* from Intel Channel

Available for Linux*

TensorFlow* version: 2.2.0

Installation instructions:

Open Anaconda prompt and use the following instruction. Available for Python 3.6 and 3.7.

conda install tensorflow -c intel

Get Intel® Optimization for TensorFlow* from Intel® Distribution for Python

Available for Linux*

TensorFlow* version: 2.2.0

Installation instructions:

Open Anaconda prompt and use the following instruction. Available for Python 3.6 and 3.7.

conda create -n IDP intelpython3_full -c intel

(or)

conda create -n IDP intelpython2_full -c intel
 
Available configurations here

 

Install the Intel® Optimization for TensorFlow* Wheel via PIP

Available for Linux*

TensorFlow version: 2.3.0

Installation instructions:

Note:

For TensorFlow versions 1.13, 1.14 and 1.15 with pip > 20.0, if you experience invalid wheel error, try to downgrade the pip version to <20.0

For e.g

python -m pip install --force-reinstall pip==19.0

Run the below instruction to install the wheel into an existing Python* installation, preferably Intel® Distribution for Python*. Python versions supported are 3.5, 3.6, 3.7, 3.8.

pip install intel-tensorflow

If your machine has AVX512 instruction set supported please use the below packages for better performance.

pip install intel-tensorflow-avx512

Pip packages are posted on Google Cloud and AWS for easy access to customers.

Python Version Minimum instruction set required Command with wheels from Google Cloud Storage
python 3.5 AVX

pip install https://storage.googleapis.com/intel-optimized-tensorflow/2.3.0/intel_tensorflow-2.3.0-cp35-cp35m-manylinux2010_x86_64.whl

AVX-512 pip install https://storage.googleapis.com/intel-optimized-tensorflow/2.3.0/intel_tensorflow_avx512-2.3.0-cp35-cp35m-manylinux2010_x86_64.whl
python 3.6 AVX

pip install https://storage.googleapis.com/intel-optimized-tensorflow/2.3.0/intel_tensorflow-2.3.0-cp36-cp36m-manylinux2010_x86_64.whl

AVX-512 pip install https://storage.googleapis.com/intel-optimized-tensorflow/2.3.0/intel_tensorflow_avx512-2.3.0-cp36-cp36m-manylinux2010_x86_64.whl
python 3.7 AVX pip install https://storage.googleapis.com/intel-optimized-tensorflow/2.3.0/intel_tensorflow-2.3.0-cp37-cp37m-manylinux2010_x86_64.whl
AVX-512 pip install https://storage.googleapis.com/intel-optimized-tensorflow/2.3.0/intel_tensorflow_avx512-2.3.0-cp37-cp37m-manylinux2010_x86_64.whl
python 3.8 AVX pip install https://storage.googleapis.com/intel-optimized-tensorflow/2.3.0/intel_tensorflow-2.3.0-cp38-cp38-manylinux2010_x86_64.whl
AVX-512 pip install https://storage.googleapis.com/intel-optimized-tensorflow/2.3.0/intel_tensorflow_avx512-2.3.0-cp38-cp38-manylinux2010_x86_64.whl

Python Version Minimum instruction set required Command with wheels from AWS Storage
python 3.5 AVX

pip install https://intel-optimized-tensorflow.s3.cn-north-1.amazonaws.com.cn/2.3/intel_tensorflow-2.3.0-cp35-cp35m-manylinux2010_x86_64.whl

AVX-512 pip install https://intel-optimized-tensorflow.s3.cn-north-1.amazonaws.com.cn/2.3/intel_tensorflow_avx512-2.3.0-cp35-cp35m-manylinux2010_x86_64.whl
python 3.6 AVX

pip install https://intel-optimized-tensorflow.s3.cn-north-1.amazonaws.com.cn/2.3/intel_tensorflow-2.3.0-cp36-cp36m-manylinux2010_x86_64.whl

AVX-512 pip install https://intel-optimized-tensorflow.s3.cn-north-1.amazonaws.com.cn/intel_tensorflow_avx512-2.3.0-cp36-cp36m-manylinux2010_x86_64.whl
python 3.7 AVX pip install https://intel-optimized-tensorflow.s3.cn-north-1.amazonaws.com.cn/2.3/intel_tensorflow-2.3.0-cp37-cp37m-manylinux2010_x86_64.whl
AVX-512 pip install https://intel-optimized-tensorflow.s3.cn-north-1.amazonaws.com.cn/2.3/intel_tensorflow_avx512-2.3.0-cp37-cp37m-manylinux2010_x86_64.whl
python 3.8 AVX pip install https://intel-optimized-tensorflow.s3.cn-north-1.amazonaws.com.cn/2.3/intel_tensorflow-2.3.0-cp38-cp38-manylinux2010_x86_64.whl
AVX-512 pip install https://intel-optimized-tensorflow.s3.cn-north-1.amazonaws.com.cn/2.3/intel_tensorflow_avx512-2.3.0-cp38-cp38-manylinux2010_x86_64.whl

 

Note: If your machine has AVX-512 instruction set supported, please download and install the wheel file with AVX-512 as minimum required instruction set from the table above.

Note: If you ran into the following Warning on ISA above AVX2, please download and install the wheel file with AVX-512 as minimum required instruction set from the table above.

I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

 

Note: If you run a release with AVX-512 as minimum required instruction set on a machine without AVX-512 instruction set support, you will run into "Illegal instruction (core dumped)" error.

Note than for 1.14.0 install we have fixed a few vulnerabilities and the corrected versions can be installed using the below commands. We identified new CVE issues from curl and GCP support in the previous pypi package release, so we had to introduce a new set of fixed packages in PyPI

Available for Linux* here

2. Docker Images

Get Intel® Optimization for TensorFlow* Docker Images

Google DL Containers

Starting version 1.14, Google released DL containers for TensorFlow on CPU optimized with oneDNN by default. The TensorFlow v1.x CPU container names are in the format "tf-cpu.<framework version>", TensorFlow v2.x CPU container names are in the format "tf2-cpu.<framework version>" and support Python3. Below are sample commands to download the docker image locally and launch the container for TensorFlow 1.14 or TensorFlow 2.3. Please use one of the following commands at one time.

# TensorFlow 1.14
docker run -d -p 8080:8080 -v /home:/home gcr.io/deeplearning-platform-release/tf-cpu.1-14
# TensorFlow 2.3
docker run -d -p 8080:8080 -v /home:/home gcr.io/deeplearning-platform-release/tf2-cpu.2-3

This command will start the TensorFlow 1.14 or TensorFlow 2.3 with oneDNN enabled in detached mode, bind the running Jupyter server to port 8080 on the local machine, and mount local /home directory to /home in the container. The running JupyterLab instance can be accessed at localhost:8080.

To launch an interactive bash instance of the docker container, run one of the below commands.

# TensorFlow 1.14
docker run -v /home:/home -it gcr.io/deeplearning-platform-release/tf-cpu.1-14 bash
# TensorFlow 2.3
docker run -v /home:/home -it gcr.io/deeplearning-platform-release/tf2-cpu.2-3 bash

 

Available Container Configurations

You can find all supported docker tags/configurations here.

 

Intel Containers at docker.com

These docker images are all published at http://hub.docker.com in intel/intel-optimized-tensorflow and intel/intel-optimized-tensorflow-avx512 namespaces and can be pulled with the following command:

 

# intel-optimized-tensorflow
docker pull intel/intel-optimized-tensorflow
# intel-optimized-tensorflow-avx512
docker pull intel/intel-optimized-tensorflow-avx512:latest

 

For example, to run the data science container directly, simply

# intel-optimized-tensorflow
docker run -it -p 8888:8888 intel/intel-optimized-tensorflow
# intel-optimized-tensorflow-avx512
docker run -it -p 8888:8888 intel/intel-optimized-tensorflow-avx512:latest

And then go to your browser on http://localhost:8888/


For those who want to navigate through the browser, follow the links:

 
Available Container Configurations

You can find all supported docker tags/configurations for intel-optimized-tensorflow and intel-optimized-tensorflow-avx512.

To get the latest Release Notes on Intel-optimized TensorFlow, please refer this article

3. Build from Source

Build TensorFlow from Source with Intel oneAPI oneDNN library

Linux build

Building TensorFlow from source is not recommended. However, if instructions provided above do not work due to unsupported ISA, you can always build from source.

Building TensorFlow from source code requires Bazel installation, refer to the instructions here, Installing Bazel.

Installation instructions:

  1. Ensure numpy, keras-applications, keras-preprocessing, pip, six, wheel, mock packages are installed in the Python environment where TensorFlow is being built and installed.
  2. Clone the TensorFlow source code and checkout a branch of your preference
    git clone https://github.com/tensorflow/tensorflow
    git checkout r2.3
    
  3. Run "./configure" from the TensorFlow source directory
  4. Execute the following commands to create a pip package that can be used to install the optimized TensorFlow build.
    • PATH can be changed to point to a specific version of GCC compiler:
      export PATH=/PATH//bin:$PATH
      
    • LD_LIBRARY_PATH can also be to new:
      export LD_LIBRARY_PATH=/PATH//lib64:$LD_LIBRARY_PATH
      
    • Set the compiler flags support by the GCC on your machine to build TensorFlow with oneDNN. Ensure appropriate "march" and "mtune" flags are set. Refer the gcc online docs to know the flags supported by your GCC version.
    • #e.g
      bazel build --config=mkl --cxxopt=-D_GLIBCXX_USE_CXX11_ABI=0 --copt=-march=sandybridge --copt=-mtune=ivybridge --copt=-O3 //tensorflow/tools/pip_package:build_pip_package
      

      (or)

    • Alternatively, set appropriate "Instruction sets" flags you want to compile the library with:

Flags set in the command below will add AVX, AVX2 and AVX512 instructions which will result in "illegal instruction" errors when you use older CPUs. If you want to build on older CPUs, set the instruction flags accordingly

bazel build --config=mkl -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mavx512f --copt=-mavx512pf --copt=-mavx512cd --copt=-mavx512er //tensorflow/tools/pip_package:build_pip_package

3. Install the optimized TensorFlow wheel

     bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/path_to_save_wheel
     pip install --upgrade --user ~/path_to_save_wheel/<wheel_name.whl>

 

Windows Build

* Prior to TensorFlow 2.3

Prerequisites

Install the below Visual C++ 2015 build tools from https://visualstudio.microsoft.com/vs/older-downloads/

  • Microsoft Visual C++ 2015 Redistributable Update 3
  • Microsoft Build Tools 2015 Update 3

Installation

  1. Refer to Linux Section and follow Steps 1 through 3
  2. To build TensorFlow with oneDNN support, we need two additional steps.
  • Link.exe on  Visual Studio 2015 causes the linker issue when /WHOLEARCHIVE switch is used. To overcome this issue, install the hotfix to your Visual C++ compiler available at https://support.microsoft.com/en-us/help/4020481/fix-link-exe-crashes-with-a-fatal-lnk1000-error-when-you-use-wholearch  
  • Add a PATH environment variable to include MKL runtime lib location that will be created during the build process. The base download location can be specified in the bazel build command by using the --output_base option, and the oneDNN libraries will then be downloaded into a directory relative to that base
set PATH=%PATH%;output_dir\external\mkl_windows\lib

         3. Bazel build with the with "mkl" flag and the "output_dir" to use the right mkl libs

bazel --output_base=output_dir build --config=mkl --config=opt //tensorflow/tools/pip_package:build_pip_package

          4. Install the optimized TensorFlow wheel

bazel-bin\tensorflow\tools\pip_package\build_pip_package C:\temp\path_to_save_wheel
       pip install C:\temp\path_to_save_wheel\<wheel_name.whl>

 

* TensorFlow 2.3 and newer:

Prerequisites

  • python 3.5 or 3.6 for Windows. Select pip as an optional feature and add it to your %PATH% environmental variable.
  • Tensorflow dependent packages(check out the dependent packages listed in setup.py)
    • pip3 install six numpy wheel
      pip3 install keras_applications==1.0.6 --no-deps
      pip3 install keras_preprocessing==1.05 --no-deps

       

  • MSYS2
    • (Required for Bazel) MSYS2 is a software distro and building platform for Windows. It contains Bash and common Unix tools (like grep, tar, git).
    • Bazel needs packages installed using the msys2 terminal (note that proxy vars need to be set otherwise these installs won’t work)
    • Open MSYS2 terminal and run the command
      pacman -S zip unzip patch diffutils git
  • Bazel
  • Install Visual C++* Build Tools 2019. It comes with Visual Studio* 2019 but can be installed separately. Go to the Visual Studio Downloads, download and install the following:

    • Microsoft Visual C++ 2019 Redistributable from Other Tools and Frameworks

    • Microsoft Build Tools 2019 from Tools for Visual Studio 2019

Installation

  1. Set the following environment variables:
    1. BAZEL_SH: C:\msys64\usr\bin\bash.exe

    2. BAZEL_VS: C:\Program Files (x86)\Microsoft Visual Studio

    3. BAZEL_VC: C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC

  2. Note: For compile time reduction, please set:

        set TF_VC_VERSION=16.6
    

    More details can be found here.

  3. Add to the PATH environment variable to include
    1. python path, e.g. C:\Program Files\Python36

    2. oneDNN runtime lib location that will be created during the build process, e.g. D:\output_dir\external\mkl\windows\lib

    3. the Bazel path, e.g. C:\Program Files\Bazel-3.2.0

    4. MSYS2 path, e.g. C:\msys64;C:\msys64\usr\bin

    5. Git path, e.g. C:\Program Files\Git\cmd;C:\Program Files\Git\usr\bin

      set PATH=%PATH%;C:\Program Files\Python36;D:\output_dir\external\mkl_windows\lib;C:\Program Files\Bazel-3.2.0;C:\msys64;C:\msys64\usr\bin;C:\Program Files\Git\cmd;C:\Program Files\Git\usr\bin

       

  4. Download the TensorFlow source code, checkout the release branch, and configure the build:
    git clone https://github.com/Intel-tensorflow/tensorflow.git
    cd tensorflow
    git checkout r2.3-windows
    python ./configure.py

     

  5. Set the oneDNN output directory location outside TensorFlow home directory to avoid infinite symlink expansion error. Then add the path to the oneDNN output directory to the system PATH:
    set OneDNN_DIR=<path-to-oneDNN-output-dir>\one_dnn_dir
    set PATH=%OneDNN_DIR%;%PATH%

     

  6. Build TensorFlow from source with oneDNN. Navigate to the TensorFlow root directory tensorflow and run the following bazel command to build TensorFlow oneDNN from Source:
    bazel --output_base=%OneDNN_DIR% build --announce_rc --config=opt \
      --config=mkl \
      --action_env=PATH="<user is expected to expand the system path here>" \
      --define=no_tensorflow_py_deps=true \
      tensorflow/tools/pip_package:build_pip_package

     

Note: Based on bazel issue #7026 we set --action_env=PATH=<value>. Open cmd.exe, run echo %PATH% and copy the output to the value of --action_env=PATH=<value>. If found, please use single quotes with folder names of white spaces.

Sanity Check

Once Intel-optimized TensorFlow is installed, running the below command must print "True" if oneDNN optimizations are present.

 

TensorFlow 1.* versions

python -c "import tensorflow; print(tensorflow.pywrap_tensorflow.IsMklEnabled())"

 

Additional Capabilities and Known Issues

  1. Intel-optimized TensorFlow enables oneDNN calls by default, If at any point you wish to disable Intel MKL primitive calls, this can be disabled by setting TF_DISABLE_MKL flag to 1 before running your TensorFlow script.
    export TF_DISABLE_MKL=1

    However, note that this flag will only disable oneDNN calls, but not MKL-ML calls.

    Although oneDNN is responsible for most optimizations, certain ops are optimized by MKL-ML library, including matmul, transpose, etc. Disabling MKL-ML calls are not supported by TF_DISABLE_MKL flag at present and Intel is working with Google to add this functionality.

  2. CPU affinity settings in Anaconda's TensorFlow: If oneDNN enabled TensorFlow is installed from the anaconda channel (not Intel channel), the "import tensorflow" command sets the KMP_BLOCKTIME and OMP_PROC_BIND environment variables if not already set. However, these variables may have effects on other libraries such as Numpy/Scipy which use OpenMP or oneDNN. Alternatively, you can either set preferred values or unset them after importing TensorFlow. More details available in the TensorFlow GitHub issue
    import tensorflow # this sets KMP_BLOCKTIME and OMP_PROC_BIND 
    import os 
    # delete the existing values 
    del os.environ['OMP_PROC_BIND'] 
    del os.environ['KMP_BLOCKTIME']

Support

If you have further questions or need support on your workload optimization, Please submit your queries at the TensorFlow GitHub issues with the label "comp:mkl" or the Intel AI Frameworks forum.

Useful Resources

Archived Wheels

Version

Wheels(2.7, 3.5, 3.6)

1.6

https://anaconda.org/intel/tensorflow/1.6.0/download/tensorflow-1.6.0-cp27-cp27mu-linux_x86_64.whl

(or)

*/tensorflow-1.6.0-cp35-cp35mu-linux_x86_64.whl

(or)

*/tensorflow-1.6.0-cp36-cp36mu-linux_x86_64.whl

1.9

https://storage.googleapis.com/intel-optimized-tensorflow/tensorflow-1.9.0-cp27-cp27mu-linux_x86_64.whl

https://storage.googleapis.com/intel-optimized-tensorflow/tensorflow-1.9.0-cp35-cp35m-linux_x86_64.whl

https://storage.googleapis.com/intel-optimized-tensorflow/tensorflow-1.9.0-cp36-cp36m-linux_x86_64.whl

1.10

https://storage.googleapis.com/intel-optimized-tensorflow/tensorflow-1.10.0-cp27-cp27mu-linux_x86_64.whl

https://storage.googleapis.com/intel-optimized-tensorflow/tensorflow-1.10.0-cp35-cp35m-linux_x86_64.whl

https://storage.googleapis.com/intel-optimized-tensorflow/tensorflow-1.10.0-cp36-cp36m-linux_x86_64.whl

1.11

https://storage.googleapis.com/intel-optimized-tensorflow/tensorflow-1.11.0-cp27-cp27mu-linux_x86_64.whl

https://storage.googleapis.com/intel-optimized-tensorflow/tensorflow-1.11.0-cp34-cp34m-linux_x86_64.whl

https://storage.googleapis.com/intel-optimized-tensorflow/tensorflow-1.11.0-cp35-cp35m-linux_x86_64.whl

https://storage.googleapis.com/intel-optimized-tensorflow/tensorflow-1.11.0-cp36-cp36m-linux_x86_64.whl

 

 

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804