Intel® Optimization for TensorFlow* Installation Guide

TensorFlow* is a predominantly used machine learning framework in deep learning arena, demanding efficient utilization of computational resources. In order to take full advantage of Intel Architecture and extract maximum performance, TensorFlow* library has been optimized using Intel MKL-DNN primitives, a popular performance library for deep learning applications.

For more information on the optimizations as well as performance data, see this blog post.

Various installation methods are described below and newer versions of Intel-optimized TensorFlow* are available for Linux .

Method 1(Recommended): Install the Intel-optimized TensorFlow wheel into an existing Python installation through PIP

TensorFlow* version: 1.6.0

Installation instructions:

Run the below instructions to install the wheel into an existing Python installation, preferably Intel Distribution for Python:

# Python 2.7
pip install https://anaconda.org/intel/tensorflow/1.6.0/download/tensorflow-1.6.0-cp27-cp27mu-linux_x86_64.whl

# Python 3.5
pip install https://anaconda.org/intel/tensorflow/1.6.0/download/tensorflow-1.6.0-cp35-cp35m-linux_x86_64.whl

# Python 3.6
pip install https://anaconda.org/intel/tensorflow/1.6.0/download/tensorflow-1.6.0-cp36-cp36m-linux_x86_64.whl

Edit 03/05/18: Wheel paths have been updated to 1.6.0

Edit 10/12/17: Wheel paths have been updated to 1.3.1

Edit 11/22/17: Wheel paths have been updated to 1.4.0

Warning on ISA above AVX2: “The TensorFlow library was not compiled to use <Instruction set> instructions, but these are available on your machine and could speed up CPU computations.”

Note that Intel MKL-DNN primitives take advantage of the latest Instruction sets available in your processor to perform compute-intensive operations despite the warnings.

Older CPUs may not support this version of TensorFlow and may result in “Illegal instruction (core dumped)”  error.

 

Method 2: Install Intel-optimized TensorFlow from Intel channel / Get from Intel Distribution for Python(IDP)

TensorFlow* version: 1.3.1

Installation instructions:

Assuming IDP is installed and activated:

conda install tensorflow -c intel

(or)

Install and activate the full IDP package:

conda create -n idpFull -c intel intelpython3_full
activate idpFull

Warning on ISA above SSE4.2: “The TensorFlow library was not compiled to use <Instruction set> instructions, but these are available on your machine and could speed up CPU computations.”

Note that Intel MKL-DNN primitives take advantage of the latest Instruction sets available in your processor to perform compute-intensive operations despite the warnings.

Method 3: Build TensorFlow from source with Intel® MKL

Building TensorFlow from source is not recommended. However, if TensorFlow cannot be installed though Method 1 or Method 2 due to unsupported ISA, you can always build from source.

Building TensorFlow from source code requires bazel installation, refer to the instructions here.

TensorFlow* version: 1.6.0

Installation instructions:

  1. Run "./configure" from the TensorFlow source directory
  2. Execute the following commands to create a pip package that can be used to install the optimized TensorFlow build.
    • PATH can be changed to point to a specific version of GCC compiler:
      export PATH=/PATH//bin:$PATH
      
    • LD_LIBRARY_PATH can also be to new:
      export LD_LIBRARY_PATH=/PATH//lib64:$LD_LIBRARY_PATH
      
    • Set the flag to build TensorFlow with Intel MKL and pass appropriate Instruction sets you want to compile the library with:
      bazel build --config=mkl -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mavx512f --copt=-mavx512pf --copt=-mavx512cd --copt=-mavx512er --copt="-DEIGEN_USE_VML" //tensorflow/tools/pip_package:build_pip_package
      
  3. Install the optimized TensorFlow wheel
    bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/path_to_save_wheel
    pip install --upgrade --user ~/path_to_save_wheel /<wheel_name.whl>
For more complete information about compiler optimizations, see our Optimization Notice.

10 comments

Top
Cai, Jason's picture

Thanks for the update about details of compiling TF+MKL from source with TF r1.6.

I am confused when doing "bazel build ...". From https://www.tensorflow.org/performance/performance_guide#tensorflow_with_intel_mkl_dnn, it says:

./configure
# Pick the desired options
bazel build --config=mkl --config=opt //tensorflow/tools/pip_package:build_pip_package

It's quite different from the bazel build cmds here.

Could you please tell me the reason why '--copt="-DEIGEN_USE_VML"' is needed?

Thanks.

Ehsan T. (Intel)'s picture

I installed this wheel in my conda environment, but python crashes with this error when I import tensorflow:

./tensorflow/core/common_runtime/mkl_cpu_allocator.h:144] Non-OK-status: s status: Unimplemented: Unimplemented case for hooking MKL function.

Qiang L. (Intel)'s picture

It seems that AVX512F is not compiled into released tensorflow 1.6. (but other KNL specific AVX512 instructions are compiled!!!)

I run simple "hello world" test on KNL after install the tensorflow 1.6 by following the steps mentioned here!!

it give me the message as follows:

2018-03-08 13:08:26.924344: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F

BTW, i try to compile the tensorflow 1.6 from source code, and adding all of avxxxxx as -copt to bazel build cmd, after build tensorflow 1.6 package correctly, re-run above test program, it seems the "AVX512F" is compiled!!!!

Karczewski, Jakub Jan (Intel)'s picture

I've run into issue when running AlexNet (https://github.com/jakubkarczewski/AlexNetF/blob/master/alexnet.py) on all of above TF versions (model works just fine on TF distribution installed with pip install tensorflow).

Error code:  tensorflow/core/kernels/mkl_lrn_op.cc:595] Check failed: dnnConversionCreate_F32( &convert_input, static_cast<dnnLayout_t>(inimage_shape.GetCurLayout()), lt_internal_input) == E_SUCCESS (-1 vs. 0)

 

Craig R.'s picture

The "... TensorFlow library wasn't compiled to use ..." may not indicate an issue.  There's a discussion over at https://software.intel.com/en-us/forums/intel-distribution-for-python/topic/738132 which suggests that you'll still get SSE/AVX performance where it counts.  However, I would like a definitive answer and way to verify that I'm getting optimized versions.

Zhao Y. (Intel)'s picture

zhaoye@zhaoye-PC:~/share1$ python3 tf_test.py
2017-12-29 14:24:47.842886: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-29 14:24:47.842934: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-29 14:24:47.842946: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-12-29 14:24:47.842956: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-29 14:24:47.842967: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.

When I ran a tensorflow program, it show the log as above, why the AVX2 instructions can not been used?

My CPU type is  Intel(R) Core(TM) i7-5960X CPU, it can support AVX2,

zhaoye@zhaoye-PC:~/Disk2/backup/tensorflow$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 63
model name      : Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
stepping        : 2
microcode       : 0x29
cpu MHz         : 3299.926
cache size      : 20480 KB
physical id     : 0
siblings        : 16
core id         : 0
cpu cores       : 8
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 15
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb intel_ppin tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts

Zynda, Greg's picture

Hello, attempting to use the recommended OMP_NUM_THREADS=136  on a KNL system seems to break multiprocessing when the included numpy package is loaded. I can reproduce this by running the following code:

#!/bin/bash

export OMP_NUM_THREADS=1

cat << EOF | python
import numpy as np
print np.__file__
import multiprocessing as mp
print mp.cpu_count()
p = mp.Pool()
EOF

export OMP_NUM_THREADS=136

cat << EOF | python
import numpy as np
print np.__file__
import multiprocessing as mp
print mp.cpu_count()
p = mp.Pool()
EOF

I get the following error

/home1/03076/gzynda/apps/tensorflow/1.4/lib/python2.7/site-packages/numpy/__init__.pyc
272
/home1/03076/gzynda/apps/tensorflow/1.4/lib/python2.7/site-packages/numpy/__init__.pyc
272
Traceback (most recent call last):
  File "<stdin>", line 5, in <module>
  File "/opt/apps/intel17/python/2.7.13/lib/python2.7/multiprocessing/__init__.py", line 232, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild)
  File "/opt/apps/intel17/python/2.7.13/lib/python2.7/multiprocessing/pool.py", line 159, in __init__
    self._repopulate_pool()
  File "/opt/apps/intel17/python/2.7.13/lib/python2.7/multiprocessing/pool.py", line 223, in _repopulate_pool
    w.start()
  File "/opt/apps/intel17/python/2.7.13/lib/python2.7/multiprocessing/process.py", line 130, in start
    self._popen = Popen(self)
  File "/opt/apps/intel17/python/2.7.13/lib/python2.7/multiprocessing/forking.py", line 121, in __init__
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

Please note that multiprocessing worked when OMP_NUM_THREADS was set to "1" in the first test. Can you recommend a method to achieve a similar scale without this setting?

Christopher H. (Intel)'s picture

Hi Prafulla,

It looks like you have to install the six module.

pip install six

Chris

Hal G. (Intel)'s picture

Questions regarding this article should be posted to the forums here: https://software.intel.com/en-us/forums/intel-distribution-for-python

Questions posted to Articles may or may not be responded to.

Regards, Hal

Intel(R) Developer Zone Support

https://software.intel.com
*Other names and brands may be claimed as the property of others.

Prafull's picture

The setup for tf36 env fails while trying to pip install Tensorflow 1.3.0. This is the error I get:

Could not import setuptools which is required to install from a source distribution.
Traceback (most recent call last):
  File "/home/prafull/tools/Platforms/anaconda3/envs/tf36/lib/python3.6/site-packages/pip/req/req_install.py", line 387, in setup_py
    import setuptools  # noqa
  File "/home/prafull/.local/lib/python3.6/site-packages/setuptools/__init__.py", line 10, in <module>
    from six.moves import filter, map
ModuleNotFoundError: No module named 'six'

 

Pages

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.