Set Up Intel® Software Optimization for Theano* and Supporting Tools



Theano is a Python* library developed at the LISA lab to define, optimize, and evaluate mathematical expressions, including the ones with multi-dimensional arrays (numpy.ndarray). Intel® optimized-Theano is a new version based on Theano 0.0.8rc1, which is optimized for Intel® architecture and enables Intel® Math Kernel Library (Intel® MKL) 2017. The latest version of the Intel MKL includes optimizations for Intel® Advanced Vector Extensions 2 (Intel® AVX2) and AVX-512 instructions which are supported in Intel® Xeon® processor and Intel® Xeon Phi™ processors.

Theano can be installed and used with several combinations of development tools and libraries on a variety of platforms. This tutorial provides one such recipe describing steps to build and install Intel optimized-Theano with Intel® compilers and Intel MKL 2017 on CentOS*- and Ubuntu*-based systems. We also verify the installation by running common industry-standard benchmarks like MNIST*, DBN-Kyoto*, LSTM* and ImageNet*.


Intel® Compilers and Intel® Math Kernel Library 2017

This tutorial assumes that Intel compilers(C/C++ and Fortran)  are already installed and verified. If not, Intel compilers can be downloaded and installed as part of the Intel® Parallel Studio XE or can be independently installed.

Installing Intel MKL 2017 is optional when using Intel® Distribution for Python*. For other python distributions Intel MKL 2017 can be downloaded as part of Intel Parallel Studio XE 2017 or can be downloaded and installed for free using the community license. To download it, first register here for a free community license and follow the installation instructions.

Python* Tools

In this tutorial, the Intel® Distribution for Python* will be used as it provides ready access to tools and techniques which are enabled and verified for higher performance on Intel architecture. This will allow usage of Intel-optimized precompiled tools like NumPy* and SciPy* without worrying about building and installing them. 

Intel Distribution for Python can be available as part of Intel Parallel Studio XE or can be also independently downloaded for free from here.

Instructions to install Intel Distribution for Python are given below. This article assumes that the Python installation is completed in the local user account

Python 2.7
tar -xvzf l_python27_p_2017.0.028.tgz
cd l_python27_p_2017.0.028

Python 3.5
tar -xvzf l_python35_p_2017.0.028.tgz
cd l_python35_p_2017.0.028

Using anaconda, create an independent user environment using the steps given below. Here the required NumPy, SciPy and Cython packages are also being installed with the . 

Python 2.7
conda create -n pcs_theano_2 -c intel python=2 numpy scipy cython
source activate pcs_theano_2

Python 3.5
conda create -n pcs_theano_2 -c intel python=3 numpy scipy cython
source activate pcs_theano_2

Alternatively, NumPy and SciPy can also be built and installed from the source as given inAppendix A. Steps to install other python development tools is also shown which may be required in case a non-intel distribution of python is used.


Building and installing Intel® Software Optimization for Theano*

Branch of theano optimized for Intel architecture can be checked out and installed from the following git repository.

git clone theano
cd theano
python build
python install
theano-cache clear

An example of the Theano configuration file is given below for reference. In order to use Intel compilers and specify the compiler flags to be used with Theano, create a copy of this file in user's home directory.

vi ~/.theanorc

root = /usr/local/cuda

device = cpu
floatX = float32
cxx = icpc
mode = FAST_RUN
openmp = True
openmp_elemwise_minsize = 10
cxxflags = -qopenmp -march=native -O3 -vec-report3 -fno-alias -opt-prefetch=2 -fp-trap=none
ldflags = -lmkl_rt


Verify Theano and NumPy Installation

It is important to verify which versions of Theano and NumPy libraries are referenced once they are imported in python. The versions of NumPy and Theano referenced in this article  are verified as follows:  

python -c "import numpy; print (numpy.__version__)"
python -c "import theano; print (theano.__version__)"

It is also important to verify that the installed versions of NumPy and Theano are using Intel MKL.

python -c "import theano; print (theano.numpy.show_config())"


Fig 1. Desired output for theano.numpy.show_config()



DBN-Kyoto and ImageNet benchmarks are available in the theano/democase directory.


Procuring the Dataset for Running DBN-Kyoto

The sample dataset can be downloaded for DBN-Kyoto from Dropbox via the following link: Unzip the file and save it in the theano/democase/DBN-Kyoto directory.



Dependencies for training DBN-Kyoto can be installed using Anaconda or built using the provided source in the tools directory. Due to some conflicts in the pandas library and Python 3, this benchmark is validated only for Python 2.7.

Python 2.7
conda install -c intel --override-channels pandas
conda install imaging

Alternatively the dependencies can also be installed from source as given in Appendix B.


Running DBN-Kyoto on CPU

The provided script can be used to download the dataset (if not already present) and start the training.

cd theano/democase/DBN-Kyoto/



In this article, we show how to train a neural network on MNIST using Lasagne, which is a lightweight library to build and train neural networks in Theano. The Lasagne library will be built and installed using Intel compilers.


Download the MNIST Database

The MNIST database can be downloaded from We downloaded images and labels for both training and validation data. 


Installing Lasagne Library

The latest version of the Lasagne library can be built and installed from the Lasagne git repository as given below:

Python 2.7 and Python 3.5
git clone
cd Lasagne
python build
python install



cd Lasagne/examples
python [model [epochs]]
                    --  where model can be mlp - simple multi layer perceptron (default) or 
                         cnn - simple convolution neural network.
                         and epochs = 500 (default)



Procuring the ImageNet dataset for AlexNet training

The ImageNet dataset can be obtained from the image-net website



Dependencies for training AlexNet can be installed using Anaconda or installed from the fedora epel source repository. Currently, support for Hickle (required dependency for preprocessing data) is only available in Python 2 and not supported on Python 3.

  • Installing h5py, pyyaml, pyzmq using Anaconda:
conda install h5py
conda install -c intel --override-channels pyyaml pyzmq 
  • Installing Hickle (HDF5-based clone of Pickle):
git clone
cd hickle
python build
python install

Alternatively, the dependencies can also be installed using the source as given in appendix B.


Preprocessing the ImageNet Dataset

Preprocessing is required to dump Hickle files and create labels for training and validation data.

  • Modify the paths.yaml file in the preprocessing directory to update the path for the dataset. One example of paths.yaml file is given below for reference.
cat theano/democase/alexnet_grp1/preprocessing/paths.yaml

train_img_dir: '/mnt/DATA2/TEST/ILSVRC2012_img_train/'
# the dir that contains folders like n01440764, n01443537, ...

val_img_dir: '/mnt/DATA2/TEST/ILSVRC2012_img_val/'
# the dir that contains ILSVRC2012_val_00000001~50000.JPEG

tar_root_dir: '/mnt/DATA2/TEST/parsed_data_toy'  # dir to store all the preprocessed files
tar_train_dir: '/mnt/DATA2/TEST/parsed_data_toy/train_hkl'  # dir to store training batches
tar_val_dir: '/mnt/DATA2/TEST/parsed_data_toy/val_hkl'  # dir to store validation batches
misc_dir: '/mnt/DATA2/TEST/parsed_data_toy/misc'
# dir to store img_mean.npy, shuffled_train_filenames.npy, train.txt, val.txt

meta_clsloc_mat: '/mnt/DATA2/imageNet-2012-images/ILSVRC2014_devkit/data/meta_clsloc.mat'
val_label_file: '/mnt/DATA2/imageNet-2012-images/ILSVRC2014_devkit/data/ILSVRC2014_clsloc_validation_ground_truth.txt'
# although from ILSVRC2014, these 2 files still work for ILSVRC2012

# caffe style train and validation labels
valtxt_filename: '/mnt/DATA2/TEST/parsed_data_toy/misc/val.txt'
traintxt_filename: '/mnt/DATA2/TEST/parsed_data_toy/misc/train.txt'

Toy data set can be created using the provided script - generate_toy_data.sh1.

cd theano/democase/alexnet_grp1/preprocessing
chmod u+x

AlexNet training on CPU

  • Modify the config.yaml file to update the path to the preprocessed dataset:
cd theano/democase/alexnet_grp1/

# Sample changes to the path for input(label_folder, mean_file) and output(weights_dir)
label_folder: /mnt/DATA2/TEST/parsed_data_toy/labels/
mean_file: /mnt/DATA2/TEST/parsed_data_toy/misc/img_mean.npy
weights_dir: ./weight/  # directory for saving weights and results
  • Similarly, modify the spec.yaml file to update the path to the parsed toy data set:
# Directories
train_folder: /mnt/DATA2/TEST/parsed_data_toy/train_hkl_b256_b256_bchw/
val_folder: /mnt/DATA2/TEST/parsed_data_toy/val_hkl_b256_b256_bchw/
  • Start the training:

Large Movie Review Dataset (IMDB)

The Large Movie Review Dataset is an example of a Recurring Neural Network using a Long Short-Term Memory (LSTM) model. The IMDB data set is used for sentiment analysis on movie reviews using the LSTM model.

Procuring the dataset:

Obtain the imdb.pkl file from and extract the file to a local folder.



The page provides two scripts: – This handles the loading the preprocessing of the IMDB dataset. – This is the primary script that defines and trains the model.

Copy both of the above files into the same folder where we have the imdb.pkl file.



Training can be started using the following command:

THEANO_FLAGS="floatX=float32" python



Error 1: In some cases, you might get errors like or, which cannot be opened. In this case try the below:

find /opt/intel -name

Add the paths to get to the /etc/ file and run the ldconfig command to link the libraries. Also make sure the MKL installation paths are set correctly in the LD_LIBRARY_PATH environment variable.

Error 2: AlexNet preprocessing error for toy data

python toy
generating toy dataset ...
Traceback (most recent call last):
  File "", line 293, in <module>
ValueError: xrange() arg 3 must not be zero

The default number of processes used to preprocess ImageNet is currently set to 16. For the toy dataset this will create more processes than required, causing the application to crash. To resolve this issue, change the number of processes in file Alexnet_CPU/preprocessing/ from 16 to 2. However, while preprocessing the full data set it is recommended to use a higher value for num_process for faster preprocessing.

num_process = 2

Error 3: Referencing the current version of Numpy when installing Intel(R) Distribution of Python* through Conda

If installing the Intel(R) Distribution of Python from within Conda instead of through the Intel(R) Distribution of Python installer, make sure that you set the PYTHONNOUSERSITE environment variable to True. This will enable the Conda environment to reference the correct version of Numpy. This is a known error in Conda. More information can be found here.




Appendix A

Installing Python* Tools For Other Python Distribution

Python 2.7 - sudo yum install python-devel python-setuptools
Python 3.5 - sudo yum install python35-libs python35-devel python35-setuptools
//Note - Python 3.5 packages can be obtained from Fedora EPEL source repository
Python 2.7 - sudo apt-get install python-dev python-setuptools
Python 3.5 - sudo apt-get install libpython3-dev python3-dev python3-setuptools
  • Incase pip and cython are not installed on the system, they can be installed using the following commands:
sudo -E easy_install pip
sudo -E pip install cython


Installing NumPy

NumPy is the fundamental package needed for scientific computing with Python. This package contains:

  1. A powerful N-dimensional array object
  2. Sophisticated (broadcasting) functions
  3. Tools for integrating C/C++ and Fortran code
  4. Useful linear algebra, Fourier transform, and random number capabilities.

Note: An older version of the NumPy library can be removed by verifying its existence and deleting the related files. However, in this tutorial all the remaining libraries will be installed in user’s local directory, so this step is optional. If required, old versions can be cleaned as follows:

  • Verify if old version exists:
python -c "import numpy; print numpy.version"
<module 'numpy.version' from '/home/plse/.local/lib/python2.7/site-packages/numpy-1.11.0rc1-py2.7-linux-x86_64.egg/numpy/version.pyc'>
  • Delete any previously installed NumPy packages:
rm -r /home/plse/.local/lib/python2.7/site-packages/numpy-1.11.0rc1-py2.7-linux-x86_64.egg
  • Building and installing NumPy optimized for Intel architecture:
git clone
//update site.cfg file to point to required MKL directory. This step is optional if parallel studio or MKL were installed in default /opt/intel directory.
python config --compiler=intelem build_clib --compiler=intelem build_ext --compiler=intelem install --user


Installing SciPy

SciPy is an open source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering

  • Building and installing SciPy:
tar -xvzf scipy-0.16.1.tar.gz    (can be downloaded from:  or 
     obtain the latest sources from 
cd scipy-0.16.1/
python config --compiler=intelem --fcompiler=intelem build_clib --compiler=intelem --fcompiler=intelem build_ext --compiler=intelem --fcompiler=intelem install --user


Appendix B

Building and installing benchmark dependencies from source


//Untar and install all the provided tools:

cd theano/democase/DBN-Kyoto/tools
tar -xvzf Imaging-1.1.7.tar.gz
cd Imaging-1.1.7
python build
python install --user

cd theano/democase/DBN-Kyoto/tools
tar -xvzf python-dateutil-2.4.1.tar.gz
cd python-dateutil-2.4.1
python build
python install --user

cd theano/democase/DBN-Kyoto/tools
tar -xvzf pytz-2014.10.tar.gz
cd pytz-2014.10
python build
python install --user

cd theano/democase/DBN-Kyoto/tools
tar -xvzf pandas-0.15.2.tar.gz
cd pandas-0.15.2
python build
python install --user



  • Installing dependencies for AlexNet from source

Access to some of the add-on packages from the fedrora epel source repository may be required for running AlexNet on CPU.

sudo rpm -ihv epel-release-7-8.noarch.rpm
sudo yum install hdf5-devel
sudo yum install zmq-devel
sudo yum install zeromq-devel
sudo yum install python-zmq
  • Installing Hickle (HDF5-based clone of Pickle):
git clone
python build install --user
  • Installing h5py (Python interface to HDF5 binary data format):
git clone
python build install --user




About The Authors

Sunny GogarSunny Gogar
Software Engineer

Sunny Gogar received a Master’s degree in Electrical and Computer Engineering from the University of Florida, Gainesville and a Bachelor’s degree in Electronics and Telecommunications from the University of Mumbai, India.  He is currently a software engineer with Intel Corporation's Software and Services Group. His interests include parallel programming and optimization for Multi-core and Many-core Processor Architectures.

 Meghana Rao received a Master’s degree in Engineering and Technology Management from Portland State University and a Bachelor’s degree in Computer Science and Engineering from Bangalore University, India.  She is a Developer Evangelist with the Software and Services Group at Intel focused on Machine Learning and Deep Learning.


For more complete information about compiler optimizations, see our Optimization Notice.


Vinay R.'s picture

Very nice article. 

Some feedback:

Theano does not seem to currently support Python 3.5 on Windows platform. You need to use Python 2.7 or 3.4. There are absolutely no error messages until you run the verification scripts.

(pcs_theano_2) C:\Users\vraoresearch\Anaconda3\theano>python -c "import theano; print (theano.__version__)"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\vraoresearch\Anaconda3\theano\theano\", line 36, in <module>
    "Theano do not support Python 3.5 on Windows. Use Python 2.7 or 3.4.")
RuntimeError: Theano do not support Python 3.5 on Windows. Use Python 2.7 or 3.4.

However, much better progress was made on an Intel Joule module running Ubuntu 16.04.1, where not only was I able to build, install and verify Theano, but I was also able to build Lasagne and run the MNIST example.

Great work on this paper!

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.