Theano is a Python* library developed at the LISA lab to define, optimize, and evaluate mathematical expressions, including the ones with multi-dimensional arrays (numpy.ndarray). Intel® optimized-Theano is a new version based on Theano 0.0.8rc1, which is optimized for Intel® architecture and enables Intel® Math Kernel Library (Intel® MKL) 2017. The latest version of the Intel MKL includes optimizations for Intel® Advanced Vector Extensions 2 (Intel® AVX2) and AVX-512 instructions which are supported in Intel® Xeon® processor and Intel® Xeon Phi™ processors.
Theano can be installed and used with several combinations of development tools and libraries on a variety of platforms. This tutorial provides one such recipe describing steps to build and install Intel optimized-Theano with Intel® compilers and Intel MKL 2017 on CentOS*- and Ubuntu*-based systems. We also verify the installation by running common industry-standard benchmarks like MNIST*, DBN-Kyoto*, LSTM* and ImageNet*.
This tutorial assumes that Intel compilers(C/C++ and Fortran) are already installed and verified. If not, Intel compilers can be downloaded and installed as part of the Intel® Parallel Studio XE or can be independently installed.
Installing Intel MKL 2017 is optional when using Intel® Distribution for Python*. For other python distributions Intel MKL 2017 can be downloaded as part of Intel Parallel Studio XE 2017 or can be downloaded and installed for free using the community license. To download it, first register here for a free community license and follow the installation instructions.
In this tutorial, the Intel® Distribution for Python* will be used as it provides ready access to tools and techniques which are enabled and verified for higher performance on Intel architecture. This will allow usage of Intel-optimized precompiled tools like NumPy* and SciPy* without worrying about building and installing them.
Instructions to install Intel Distribution for Python are given below. This article assumes that the Python installation is completed in the local user account
Python 2.7 tar -xvzf l_python27_p_2017.0.028.tgz cd l_python27_p_2017.0.028 ./install.sh Python 3.5 tar -xvzf l_python35_p_2017.0.028.tgz cd l_python35_p_2017.0.028 ./install.sh
Using anaconda, create an independent user environment using the steps given below. Here the required NumPy, SciPy and Cython packages are also being installed with the .
Python 2.7 conda create -n pcs_theano_2 -c intel python=2 numpy scipy cython source activate pcs_theano_2 Python 3.5 conda create -n pcs_theano_2 -c intel python=3 numpy scipy cython source activate pcs_theano_2
Alternatively, NumPy and SciPy can also be built and installed from the source as given in Appendix A. Steps to install other python development tools is also shown which may be required in case a non-intel distribution of python is used.
Branch of theano optimized for Intel architecture can be checked out and installed from the following git repository.
git clone https://github.com/intel/theano.git theano cd theano python setup.py build python setup.py install theano-cache clear
An example of the Theano configuration file is given below for reference. In order to use Intel compilers and specify the compiler flags to be used with Theano, create a copy of this file in user's home directory.
vi ~/.theanorc [cuda] root = /usr/local/cuda [global] device = cpu floatX = float32 cxx = icpc mode = FAST_RUN openmp = True openmp_elemwise_minsize = 10 [gcc] cxxflags = -qopenmp -march=native -O3 -vec-report3 -fno-alias -opt-prefetch=2 -fp-trap=none [blas] ldflags = -lmkl_rt
It is important to verify which versions of Theano and NumPy libraries are referenced once they are imported in python. The versions of NumPy and Theano referenced in this article are verified as follows:
python -c "import numpy; print (numpy.__version__)" ->1.11.1 python -c "import theano; print (theano.__version__)" -> 0.9.0dev1.dev-*
It is also important to verify that the installed versions of NumPy and Theano are using Intel MKL.
python -c "import theano; print (theano.numpy.show_config())"
Fig 1. Desired output for theano.numpy.show_config()
DBN-Kyoto and ImageNet benchmarks are available in the theano/democase directory.
The sample dataset can be downloaded for DBN-Kyoto from Dropbox via the following link: https://www.dropbox.com/s/ocjgzonmxpmerry/dataset1.pkl.7z?dl=0. Unzip the file and save it in the theano/democase/DBN-Kyoto directory.
Dependencies for training DBN-Kyoto can be installed using Anaconda or built using the provided source in the tools directory. Due to some conflicts in the pandas library and Python 3, this benchmark is validated only for Python 2.7.
Python 2.7 conda install -c intel --override-channels pandas conda install imaging
Alternatively the dependencies can also be installed from source as given in Appendix B.
The provided run.sh script can be used to download the dataset (if not already present) and start the training.
cd theano/democase/DBN-Kyoto/ ./run.sh
In this article, we show how to train a neural network on MNIST using Lasagne, which is a lightweight library to build and train neural networks in Theano. The Lasagne library will be built and installed using Intel compilers.
The MNIST database can be downloaded from http://yann.lecun.com/exdb/mnist/. We downloaded images and labels for both training and validation data.
The latest version of the Lasagne library can be built and installed from the Lasagne git repository as given below:
Python 2.7 and Python 3.5 git clone https://github.com/Lasagne/Lasagne.git cd Lasagne python setup.py build python setup.py install
cd Lasagne/examples python mnist.py [model [epochs]] -- where model can be mlp - simple multi layer perceptron (default) or cnn - simple convolution neural network. and epochs = 500 (default)
The ImageNet dataset can be obtained from the image-net website.
Dependencies for training AlexNet can be installed using Anaconda or installed from the fedora epel source repository. Currently, support for Hickle (required dependency for preprocessing data) is only available in Python 2 and not supported on Python 3.
conda install h5py conda install -c intel --override-channels pyyaml pyzmq
git clone https://github.com/telegraphic/hickle.git cd hickle python setup.py build python setup.py install
Alternatively, the dependencies can also be installed using the source as given in appendix B.
Preprocessing is required to dump Hickle files and create labels for training and validation data.
cat theano/democase/alexnet_grp1/preprocessing/paths.yaml train_img_dir: '/mnt/DATA2/TEST/ILSVRC2012_img_train/' # the dir that contains folders like n01440764, n01443537, ... val_img_dir: '/mnt/DATA2/TEST/ILSVRC2012_img_val/' # the dir that contains ILSVRC2012_val_00000001~50000.JPEG tar_root_dir: '/mnt/DATA2/TEST/parsed_data_toy' # dir to store all the preprocessed files tar_train_dir: '/mnt/DATA2/TEST/parsed_data_toy/train_hkl' # dir to store training batches tar_val_dir: '/mnt/DATA2/TEST/parsed_data_toy/val_hkl' # dir to store validation batches misc_dir: '/mnt/DATA2/TEST/parsed_data_toy/misc' # dir to store img_mean.npy, shuffled_train_filenames.npy, train.txt, val.txt meta_clsloc_mat: '/mnt/DATA2/imageNet-2012-images/ILSVRC2014_devkit/data/meta_clsloc.mat' val_label_file: '/mnt/DATA2/imageNet-2012-images/ILSVRC2014_devkit/data/ILSVRC2014_clsloc_validation_ground_truth.txt' # although from ILSVRC2014, these 2 files still work for ILSVRC2012 # caffe style train and validation labels valtxt_filename: '/mnt/DATA2/TEST/parsed_data_toy/misc/val.txt' traintxt_filename: '/mnt/DATA2/TEST/parsed_data_toy/misc/train.txt'
Toy data set can be created using the provided script - generate_toy_data.sh1.
cd theano/democase/alexnet_grp1/preprocessing chmod u+x make_hkl.py make_labels.py make_train_val_txt.py ./generate_toy_data.sh
cd theano/democase/alexnet_grp1/ # Sample changes to the path for input(label_folder, mean_file) and output(weights_dir) label_folder: /mnt/DATA2/TEST/parsed_data_toy/labels/ mean_file: /mnt/DATA2/TEST/parsed_data_toy/misc/img_mean.npy weights_dir: ./weight/ # directory for saving weights and results
# Directories train_folder: /mnt/DATA2/TEST/parsed_data_toy/train_hkl_b256_b256_bchw/ val_folder: /mnt/DATA2/TEST/parsed_data_toy/val_hkl_b256_b256_bchw/
The Large Movie Review Dataset is an example of a Recurring Neural Network using a Long Short-Term Memory (LSTM) model. The IMDB data set is used for sentiment analysis on movie reviews using the LSTM model.
Obtain the imdb.pkl file from http://www-labs.iro.umontreal.ca/~lisa/deep/data/ and extract the file to a local folder.
The http://deeplearning.net/tutorial/lstm.html page provides two scripts:
Imdb.py – This handles the loading the preprocessing of the IMDB dataset.
Lstm.py – This is the primary script that defines and trains the model.
Copy both of the above files into the same folder where we have the imdb.pkl file.
Training can be started using the following command:
THEANO_FLAGS="floatX=float32" python lstm.py
Error 1: In some cases, you might get errors like libmkl_rt.so or libimf.so, which cannot be opened. In this case try the below:
find /opt/intel -name library_name.so
Add the paths to get to the /etc/ ld.so.conf file and run the ldconfig command to link the libraries. Also make sure the MKL installation paths are set correctly in the LD_LIBRARY_PATH environment variable.
python make_hkl.py toy generating toy dataset ... Traceback (most recent call last): File "make_hkl.py", line 293, in <module> train_batchs_per_core) ValueError: xrange() arg 3 must not be zero
The default number of processes used to preprocess ImageNet is currently set to 16. For the toy dataset this will create more processes than required, causing the application to crash. To resolve this issue, change the number of processes in file Alexnet_CPU/preprocessing/make_hkl.py:258 from 16 to 2. However, while preprocessing the full data set it is recommended to use a higher value for num_process for faster preprocessing.
num_process = 2
If installing the Intel(R) Distribution of Python from within Conda instead of through the Intel(R) Distribution of Python installer, make sure that you set the PYTHONNOUSERSITE environment variable to True. This will enable the Conda environment to reference the correct version of Numpy. This is a known error in Conda. More information can be found here.
CentOS: Python 2.7 - sudo yum install python-devel python-setuptools Python 3.5 - sudo yum install python35-libs python35-devel python35-setuptools //Note - Python 3.5 packages can be obtained from Fedora EPEL source repository Ubuntu: Python 2.7 - sudo apt-get install python-dev python-setuptools Python 3.5 - sudo apt-get install libpython3-dev python3-dev python3-setuptools
sudo -E easy_install pip sudo -E pip install cython
NumPy is the fundamental package needed for scientific computing with Python. This package contains:
Note: An older version of the NumPy library can be removed by verifying its existence and deleting the related files. However, in this tutorial all the remaining libraries will be installed in user’s local directory, so this step is optional. If required, old versions can be cleaned as follows:
python -c "import numpy; print numpy.version" <module 'numpy.version' from '/home/plse/.local/lib/python2.7/site-packages/numpy-1.11.0rc1-py2.7-linux-x86_64.egg/numpy/version.pyc'>
rm -r /home/plse/.local/lib/python2.7/site-packages/numpy-1.11.0rc1-py2.7-linux-x86_64.egg
git clone https://github.com/pcs-theano/numpy.git //update site.cfg file to point to required MKL directory. This step is optional if parallel studio or MKL were installed in default /opt/intel directory. python setup.py config --compiler=intelem build_clib --compiler=intelem build_ext --compiler=intelem install --user
SciPy is an open source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering
tar -xvzf scipy-0.16.1.tar.gz (can be downloaded from: https://sourceforge.net/projects/scipy/files/scipy/0.16.1/ or obtain the latest sources from https://github.com/scipy/scipy/releases) cd scipy-0.16.1/ python setup.py config --compiler=intelem --fcompiler=intelem build_clib --compiler=intelem --fcompiler=intelem build_ext --compiler=intelem --fcompiler=intelem install --user
//Untar and install all the provided tools: cd theano/democase/DBN-Kyoto/tools tar -xvzf Imaging-1.1.7.tar.gz cd Imaging-1.1.7 python setup.py build python setup.py install --user cd theano/democase/DBN-Kyoto/tools tar -xvzf python-dateutil-2.4.1.tar.gz cd python-dateutil-2.4.1 python setup.py build python setup.py install --user cd theano/democase/DBN-Kyoto/tools tar -xvzf pytz-2014.10.tar.gz cd pytz-2014.10 python setup.py build python setup.py install --user cd theano/democase/DBN-Kyoto/tools tar -xvzf pandas-0.15.2.tar.gz cd pandas-0.15.2 python setup.py build python setup.py install --user
Access to some of the add-on packages from the fedrora epel source repository may be required for running AlexNet on CPU.
wget http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-8.noarch.rpm sudo rpm -ihv epel-release-7-8.noarch.rpm sudo yum install hdf5-devel sudo yum install zmq-devel sudo yum install zeromq-devel sudo yum install python-zmq
git clone https://github.com/telegraphic/hickle.git python setup.py build install --user
git clone https://github.com/h5py/h5py.git python setup.py build install --user
Sunny Gogar received a Master’s degree in Electrical and Computer Engineering from the University of Florida, Gainesville and a Bachelor’s degree in Electronics and Telecommunications from the University of Mumbai, India. He is currently a software engineer with Intel Corporation's Software and Services Group. His interests include parallel programming and optimization for Multi-core and Many-core Processor Architectures.
Meghana Rao received a Master’s degree in Engineering and Technology Management from Portland State University and a Bachelor’s degree in Computer Science and Engineering from Bangalore University, India. She is a Developer Evangelist with the Software and Services Group at Intel focused on Machine Learning and Deep Learning.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804