Intel Optimized Tensorflow Wheel Now Available

Intel's Tensorflow optimizations are now available for Linux as a wheel installable through pip.

For more information on the optimizations as well as performance data, see this blog post.

To install the wheel into an existing Python installation, simply run

# Python 2.7
pip install https://anaconda.org/intel/tensorflow/1.4.0/download/tensorflow-1.4.0-cp27-cp27mu-linux_x86_64.whl

# Python 3.5
pip install https://anaconda.org/intel/tensorflow/1.3.0/download/tensorflow-1.3.0-cp35-cp35m-linux_x86_64.whl

# Python 3.6
pip install https://anaconda.org/intel/tensorflow/1.4.0/download/tensorflow-1.4.0-cp36-cp36m-linux_x86_64.whl

Edit 10/12/17: Wheel paths have been updated to 1.3.0

Edit 11/22/17: Wheel paths have been updated to 1.4.0

To create a conda environment with Intel Tensorflow that also takes advantage of the Intel Distribution for Python’s optimized numpy, run

conda create -n tf -c intel python=<2|3> pip numpy
. activate tf
# Python 3.6
pip install https://anaconda.org/intel/tensorflow/1.4.0/download/tensorflow-1.4.0-cp36-cp36m-linux_x86_64.whl
# Python 2.7
pip install https://anaconda.org/intel/tensorflow/1.4.0/download/tensorflow-1.4.0-cp27-cp27mu-linux_x86_64.whl

Conda Package Now Available in Intel Python 2018

A conda package of Intel's optimized Tensorflow comes with the new 2018 Intel Python distribution on Linux.  You can also create a conda environment with Intel Optimized Tensorflow with the following commands:

conda create -n intel_tf -c intel --override-channels tensorflow
source activate intel_tf

 

For more complete information about compiler optimizations, see our Optimization Notice.

8 comments

Top
Karczewski, Jakub Jan (Intel)'s picture

I've run into issue when running AlexNet (https://github.com/jakubkarczewski/AlexNetF/blob/master/alexnet.py) on all of above TF versions (model works just fine on TF distribution installed with pip install tensorflow).

Error code:  tensorflow/core/kernels/mkl_lrn_op.cc:595] Check failed: dnnConversionCreate_F32( &convert_input, static_cast<dnnLayout_t>(inimage_shape.GetCurLayout()), lt_internal_input) == E_SUCCESS (-1 vs. 0)

 

Craig R.'s picture

The "... TensorFlow library wasn't compiled to use ..." may not indicate an issue.  There's a discussion over at https://software.intel.com/en-us/forums/intel-distribution-for-python/topic/738132 which suggests that you'll still get SSE/AVX performance where it counts.  However, I would like a definitive answer and way to verify that I'm getting optimized versions.

Zhao Y. (Intel)'s picture

zhaoye@zhaoye-PC:~/share1$ python3 tf_test.py
2017-12-29 14:24:47.842886: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-29 14:24:47.842934: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-29 14:24:47.842946: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-12-29 14:24:47.842956: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-29 14:24:47.842967: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.

When I ran a tensorflow program, it show the log as above, why the AVX2 instructions can not been used?

My CPU type is  Intel(R) Core(TM) i7-5960X CPU, it can support AVX2,

zhaoye@zhaoye-PC:~/Disk2/backup/tensorflow$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 63
model name      : Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
stepping        : 2
microcode       : 0x29
cpu MHz         : 3299.926
cache size      : 20480 KB
physical id     : 0
siblings        : 16
core id         : 0
cpu cores       : 8
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 15
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb intel_ppin tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts

Zynda, Greg's picture

Hello, attempting to use the recommended OMP_NUM_THREADS=136  on a KNL system seems to break multiprocessing when the included numpy package is loaded. I can reproduce this by running the following code:

#!/bin/bash

export OMP_NUM_THREADS=1

cat << EOF | python
import numpy as np
print np.__file__
import multiprocessing as mp
print mp.cpu_count()
p = mp.Pool()
EOF

export OMP_NUM_THREADS=136

cat << EOF | python
import numpy as np
print np.__file__
import multiprocessing as mp
print mp.cpu_count()
p = mp.Pool()
EOF

I get the following error

/home1/03076/gzynda/apps/tensorflow/1.4/lib/python2.7/site-packages/numpy/__init__.pyc
272
/home1/03076/gzynda/apps/tensorflow/1.4/lib/python2.7/site-packages/numpy/__init__.pyc
272
Traceback (most recent call last):
  File "<stdin>", line 5, in <module>
  File "/opt/apps/intel17/python/2.7.13/lib/python2.7/multiprocessing/__init__.py", line 232, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild)
  File "/opt/apps/intel17/python/2.7.13/lib/python2.7/multiprocessing/pool.py", line 159, in __init__
    self._repopulate_pool()
  File "/opt/apps/intel17/python/2.7.13/lib/python2.7/multiprocessing/pool.py", line 223, in _repopulate_pool
    w.start()
  File "/opt/apps/intel17/python/2.7.13/lib/python2.7/multiprocessing/process.py", line 130, in start
    self._popen = Popen(self)
  File "/opt/apps/intel17/python/2.7.13/lib/python2.7/multiprocessing/forking.py", line 121, in __init__
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

Please note that multiprocessing worked when OMP_NUM_THREADS was set to "1" in the first test. Can you recommend a method to achieve a similar scale without this setting?

Christopher H. (Intel)'s picture

Hi Prafulla,

It looks like you have to install the six module.

pip install six

Chris

Hal G. (Intel)'s picture

Questions regarding this article should be posted to the forums here: https://software.intel.com/en-us/forums/intel-distribution-for-python

Questions posted to Articles may or may not be responded to.

Regards, Hal

Intel(R) Developer Zone Support

https://software.intel.com
*Other names and brands may be claimed as the property of others.

Prafull's picture

The setup for tf36 env fails while trying to pip install Tensorflow 1.3.0. This is the error I get:

Could not import setuptools which is required to install from a source distribution.
Traceback (most recent call last):
  File "/home/prafull/tools/Platforms/anaconda3/envs/tf36/lib/python3.6/site-packages/pip/req/req_install.py", line 387, in setup_py
    import setuptools  # noqa
  File "/home/prafull/.local/lib/python3.6/site-packages/setuptools/__init__.py", line 10, in <module>
    from six.moves import filter, map
ModuleNotFoundError: No module named 'six'

 

Rajeev A. (Intel)'s picture

# Python 2.7

conda create -n tf27 -c intel python=2.7 pip numpy
. activate tf27
pip install https://anaconda.org/intel/tensorflow/1.3.0/download/tensorflow-1.3.0-cp27-cp27mu-linux_x86_64.whl

# Python 3.5

conda create -n tf35 -c intel python=3.5 pip numpy
. activate tf35
pip install https://anaconda.org/intel/tensorflow/1.3.0/download/tensorflow-1.3.0-cp35-cp35m-linux_x86_64.whl

# Python 3.6

conda create -n tf36 -c intel python=3.6 pip numpy
. activate tf36
pip install https://anaconda.org/intel/tensorflow/1.3.0/download/tensorflow-1.3.0-cp36-cp36m-linux_x86_64.whl

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.