Theano is a Python* library developed at the LISA lab to define, optimize, and evaluate mathematical expressions, including the ones with multi-dimensional arrays (numpy.ndarray). Intel® optimized-Theano is a new version based on Theano 0.0.8rc1, which is optimized for Intel® architecture and enables Intel® Math Kernel Library (Intel® MKL) 2017. The latest version of the Intel MKL includes optimizations for Intel® Advanced Vector Extensions 2 (Intel® AVX2) and AVX-512 instructions which are supported in Intel® Xeon® processor and Intel® Xeon Phi™ processors.
Theano can be installed and used with several combinations of development tools and libraries on a variety of platforms. This tutorial provides one such recipe describing steps to build and install Intel optimized-Theano with Intel® compilers and Intel MKL 2017 on CentOS*- and Ubuntu*-based systems. We also verify the installation by running common industry-standard benchmarks like MNIST*, DBN-Kyoto*, LSTM* and ImageNet*.
This tutorial assumes that Intel compilers(C/C++ and Fortran) are already installed and verified. If not, Intel compilers can be downloaded and installed as part of the Intel® Parallel Studio XE or can be independently installed.
Installing Intel MKL 2017 is optional when using Intel® Distribution for Python*. For other python distributions Intel MKL 2017 can be downloaded as part of Intel Parallel Studio XE 2017 or can be downloaded and installed for free using the community license. To download it, first register here for a free community license and follow the installation instructions.
In this tutorial, the Intel® Distribution for Python* will be used as it provides ready access to tools and techniques which are enabled and verified for higher performance on Intel architecture. This will allow usage of Intel-optimized precompiled tools like NumPy* and SciPy* without worrying about building and installing them.
Instructions to install Intel Distribution for Python are given below. This article assumes that the Python installation is completed in the local user account
Python 2.7 tar -xvzf l_python27_p_2017.0.028.tgz cd l_python27_p_2017.0.028 ./install.sh Python 3.5 tar -xvzf l_python35_p_2017.0.028.tgz cd l_python35_p_2017.0.028 ./install.sh
Using anaconda, create an independent user environment using the steps given below. Here the required NumPy, SciPy and Cython packages are also being installed with the .
Python 2.7 conda create -n pcs_theano_2 -c intel python=2 numpy scipy cython source activate pcs_theano_2 Python 3.5 conda create -n pcs_theano_2 -c intel python=3 numpy scipy cython source activate pcs_theano_2
Alternatively, NumPy and SciPy can also be built and installed from the source as given inAppendix A. Steps to install other python development tools is also shown which may be required in case a non-intel distribution of python is used.
Branch of theano optimized for Intel architecture can be checked out and installed from the following git repository.
git clone https://github.com/intel/theano.git theano cd theano python setup.py build python setup.py install theano-cache clear
An example of the Theano configuration file is given below for reference. In order to use Intel compilers and specify the compiler flags to be used with Theano, create a copy of this file in user's home directory.
vi ~/.theanorc [cuda] root = /usr/local/cuda [global] device = cpu floatX = float32 cxx = icpc mode = FAST_RUN openmp = True openmp_elemwise_minsize = 10 [gcc] cxxflags = -qopenmp -march=native -O3 -vec-report3 -fno-alias -opt-prefetch=2 -fp-trap=none [blas] ldflags = -lmkl_rt
It is important to verify which versions of Theano and NumPy libraries are referenced once they are imported in python. The versions of NumPy and Theano referenced in this article are verified as follows:
python -c "import numpy; print (numpy.__version__)" ->1.11.1 python -c "import theano; print (theano.__version__)" -> 0.9.0dev1.dev-*
It is also important to verify that the installed versions of NumPy and Theano are using Intel MKL.
python -c "import theano; print (theano.numpy.show_config())"
Fig 1. Desired output for theano.numpy.show_config()
DBN-Kyoto and ImageNet benchmarks are available in the theano/democase directory.
Procuring the Dataset for Running DBN-Kyoto
The sample dataset can be downloaded for DBN-Kyoto from Dropbox via the following link:https://www.dropbox.com/s/ocjgzonmxpmerry/dataset1.pkl.7z?dl=0. Unzip the file and save it in the theano/democase/DBN-Kyoto directory.
Dependencies for training DBN-Kyoto can be installed using Anaconda or built using the provided source in the tools directory. Due to some conflicts in the pandas library and Python 3, this benchmark is validated only for Python 2.7.
Python 2.7 conda install -c intel --override-channels pandas conda install imaging
Alternatively the dependencies can also be installed from source as given in Appendix B.
Running DBN-Kyoto on CPU
The provided run.sh script can be used to download the dataset (if not already present) and start the training.
cd theano/democase/DBN-Kyoto/ ./run.sh
In this article, we show how to train a neural network on MNIST using Lasagne, which is a lightweight library to build and train neural networks in Theano. The Lasagne library will be built and installed using Intel compilers.
Download the MNIST Database
The MNIST database can be downloaded from http://yann.lecun.com/exdb/mnist/. We downloaded images and labels for both training and validation data.
Installing Lasagne Library
The latest version of the Lasagne library can be built and installed from the Lasagne git repository as given below:
Python 2.7 and Python 3.5 git clone https://github.com/Lasagne/Lasagne.git cd Lasagne python setup.py build python setup.py install
cd Lasagne/examples python mnist.py [model [epochs]] -- where model can be mlp - simple multi layer perceptron (default) or cnn - simple convolution neural network. and epochs = 500 (default)
Procuring the ImageNet dataset for AlexNet training
The ImageNet dataset can be obtained from the image-net website.
Dependencies for training AlexNet can be installed using Anaconda or installed from the fedora epel source repository. Currently, support for Hickle (required dependency for preprocessing data) is only available in Python 2 and not supported on Python 3.
- Installing h5py, pyyaml, pyzmq using Anaconda:
conda install h5py conda install -c intel --override-channels pyyaml pyzmq
- Installing Hickle (HDF5-based clone of Pickle):
git clone https://github.com/telegraphic/hickle.git cd hickle python setup.py build python setup.py install
Alternatively, the dependencies can also be installed using the source as given in appendix B.
Preprocessing the ImageNet Dataset
Preprocessing is required to dump Hickle files and create labels for training and validation data.
- Modify the paths.yaml file in the preprocessing directory to update the path for the dataset. One example of paths.yaml file is given below for reference.
cat theano/democase/alexnet_grp1/preprocessing/paths.yaml train_img_dir: '/mnt/DATA2/TEST/ILSVRC2012_img_train/' # the dir that contains folders like n01440764, n01443537, ... val_img_dir: '/mnt/DATA2/TEST/ILSVRC2012_img_val/' # the dir that contains ILSVRC2012_val_00000001~50000.JPEG tar_root_dir: '/mnt/DATA2/TEST/parsed_data_toy' # dir to store all the preprocessed files tar_train_dir: '/mnt/DATA2/TEST/parsed_data_toy/train_hkl' # dir to store training batches tar_val_dir: '/mnt/DATA2/TEST/parsed_data_toy/val_hkl' # dir to store validation batches misc_dir: '/mnt/DATA2/TEST/parsed_data_toy/misc' # dir to store img_mean.npy, shuffled_train_filenames.npy, train.txt, val.txt meta_clsloc_mat: '/mnt/DATA2/imageNet-2012-images/ILSVRC2014_devkit/data/meta_clsloc.mat' val_label_file: '/mnt/DATA2/imageNet-2012-images/ILSVRC2014_devkit/data/ILSVRC2014_clsloc_validation_ground_truth.txt' # although from ILSVRC2014, these 2 files still work for ILSVRC2012 # caffe style train and validation labels valtxt_filename: '/mnt/DATA2/TEST/parsed_data_toy/misc/val.txt' traintxt_filename: '/mnt/DATA2/TEST/parsed_data_toy/misc/train.txt'
Toy data set can be created using the provided script - generate_toy_data.sh1.
cd theano/democase/alexnet_grp1/preprocessing chmod u+x make_hkl.py make_labels.py make_train_val_txt.py ./generate_toy_data.sh
AlexNet training on CPU
- Modify the config.yaml file to update the path to the preprocessed dataset:
cd theano/democase/alexnet_grp1/ # Sample changes to the path for input(label_folder, mean_file) and output(weights_dir) label_folder: /mnt/DATA2/TEST/parsed_data_toy/labels/ mean_file: /mnt/DATA2/TEST/parsed_data_toy/misc/img_mean.npy weights_dir: ./weight/ # directory for saving weights and results
- Similarly, modify the spec.yaml file to update the path to the parsed toy data set:
# Directories train_folder: /mnt/DATA2/TEST/parsed_data_toy/train_hkl_b256_b256_bchw/ val_folder: /mnt/DATA2/TEST/parsed_data_toy/val_hkl_b256_b256_bchw/
- Start the training:
The Large Movie Review Dataset is an example of a Recurring Neural Network using a Long Short-Term Memory (LSTM) model. The IMDB data set is used for sentiment analysis on movie reviews using the LSTM model.
Procuring the dataset:
Obtain the imdb.pkl file from http://www-labs.iro.umontreal.ca/~lisa/deep/data/ and extract the file to a local folder.
The http://deeplearning.net/tutorial/lstm.html page provides two scripts:
Imdb.py – This handles the loading the preprocessing of the IMDB dataset.
Lstm.py – This is the primary script that defines and trains the model.
Copy both of the above files into the same folder where we have the imdb.pkl file.
Training can be started using the following command:
THEANO_FLAGS="floatX=float32" python lstm.py
Error 1: In some cases, you might get errors like libmkl_rt.so or libimf.so, which cannot be opened. In this case try the below:
find /opt/intel -name library_name.so
Add the paths to get to the /etc/ ld.so.conf file and run the ldconfig command to link the libraries. Also make sure the MKL installation paths are set correctly in the LD_LIBRARY_PATH environment variable.
python make_hkl.py toy generating toy dataset ... Traceback (most recent call last): File "make_hkl.py", line 293, in <module> train_batchs_per_core) ValueError: xrange() arg 3 must not be zero
The default number of processes used to preprocess ImageNet is currently set to 16. For the toy dataset this will create more processes than required, causing the application to crash. To resolve this issue, change the number of processes in file Alexnet_CPU/preprocessing/make_hkl.py:258 from 16 to 2. However, while preprocessing the full data set it is recommended to use a higher value for num_process for faster preprocessing.
num_process = 2
Error 3: Referencing the current version of Numpy when installing Intel(R) Distribution of Python* through Conda
If installing the Intel(R) Distribution of Python from within Conda instead of through the Intel(R) Distribution of Python installer, make sure that you set the PYTHONNOUSERSITE environment variable to True. This will enable the Conda environment to reference the correct version of Numpy. This is a known error in Conda. More information can be found here.
- GitHub repo - Intel optimized Theano
- GitHub rep - Lasagne
- GitHub repo - Intel optimized NumPy (if building from source)
Installing Python* Tools For Other Python Distribution
CentOS: Python 2.7 - sudo yum install python-devel python-setuptools Python 3.5 - sudo yum install python35-libs python35-devel python35-setuptools //Note - Python 3.5 packages can be obtained from Fedora EPEL source repository Ubuntu: Python 2.7 - sudo apt-get install python-dev python-setuptools Python 3.5 - sudo apt-get install libpython3-dev python3-dev python3-setuptools
- Incase pip and cython are not installed on the system, they can be installed using the following commands:
sudo -E easy_install pip sudo -E pip install cython
NumPy is the fundamental package needed for scientific computing with Python. This package contains:
- A powerful N-dimensional array object
- Sophisticated (broadcasting) functions
- Tools for integrating C/C++ and Fortran code
- Useful linear algebra, Fourier transform, and random number capabilities.
Note: An older version of the NumPy library can be removed by verifying its existence and deleting the related files. However, in this tutorial all the remaining libraries will be installed in user’s local directory, so this step is optional. If required, old versions can be cleaned as follows:
- Verify if old version exists:
python -c "import numpy; print numpy.version" <module 'numpy.version' from '/home/plse/.local/lib/python2.7/site-packages/numpy-1.11.0rc1-py2.7-linux-x86_64.egg/numpy/version.pyc'>
- Delete any previously installed NumPy packages:
rm -r /home/plse/.local/lib/python2.7/site-packages/numpy-1.11.0rc1-py2.7-linux-x86_64.egg
- Building and installing NumPy optimized for Intel architecture:
git clone https://github.com/pcs-theano/numpy.git //update site.cfg file to point to required MKL directory. This step is optional if parallel studio or MKL were installed in default /opt/intel directory. python setup.py config --compiler=intelem build_clib --compiler=intelem build_ext --compiler=intelem install --user
SciPy is an open source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering
- Building and installing SciPy:
tar -xvzf scipy-0.16.1.tar.gz (can be downloaded from: https://sourceforge.net/projects/scipy/files/scipy/0.16.1/ or obtain the latest sources from https://github.com/scipy/scipy/releases) cd scipy-0.16.1/ python setup.py config --compiler=intelem --fcompiler=intelem build_clib --compiler=intelem --fcompiler=intelem build_ext --compiler=intelem --fcompiler=intelem install --user
Building and installing benchmark dependencies from source
//Untar and install all the provided tools: cd theano/democase/DBN-Kyoto/tools tar -xvzf Imaging-1.1.7.tar.gz cd Imaging-1.1.7 python setup.py build python setup.py install --user cd theano/democase/DBN-Kyoto/tools tar -xvzf python-dateutil-2.4.1.tar.gz cd python-dateutil-2.4.1 python setup.py build python setup.py install --user cd theano/democase/DBN-Kyoto/tools tar -xvzf pytz-2014.10.tar.gz cd pytz-2014.10 python setup.py build python setup.py install --user cd theano/democase/DBN-Kyoto/tools tar -xvzf pandas-0.15.2.tar.gz cd pandas-0.15.2 python setup.py build python setup.py install --user
- Installing dependencies for AlexNet from source
Access to some of the add-on packages from the fedrora epel source repository may be required for running AlexNet on CPU.
wget http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-8.noarch.rpm sudo rpm -ihv epel-release-7-8.noarch.rpm sudo yum install hdf5-devel sudo yum install zmq-devel sudo yum install zeromq-devel sudo yum install python-zmq
- Installing Hickle (HDF5-based clone of Pickle):
git clone https://github.com/telegraphic/hickle.git python setup.py build install --user
- Installing h5py (Python interface to HDF5 binary data format):
git clone https://github.com/h5py/h5py.git python setup.py build install --user
- LSTM tutorial
- DBN tutorial
- Superior Performance Commits Kyoto University to CPUs Over GPUs
- Introduction of the LSTM model:
- [pdf] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
- Addition of the forget gate to the LSTM model:
- [pdf] Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural computation, 12(10), 2451-2471.
- LSTM paper:
- [pdf] Graves, Alex. Supervised sequence labelling with recurrent neural networks. Vol. 385. Springer, 2012.
- [pdf] Bastien, Frédéric, Lamblin, Pascal, Pascanu, Razvan, Bergstra, James, Goodfellow, Ian, Bergeron, Arnaud, Bouchard, Nicolas, and Bengio, Yoshua. Theano: new features and speed improvements. NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2012.
- [pdf] Bergstra, James, Breuleux, Olivier, Bastien, Frédéric, Lamblin, Pascal, Pascanu, Razvan, Desjardins, Guillaume, Turian, Joseph, Warde-Farley, David, and Bengio, Yoshua. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010.
About The Authors
Sunny Gogar received a Master’s degree in Electrical and Computer Engineering from the University of Florida, Gainesville and a Bachelor’s degree in Electronics and Telecommunications from the University of Mumbai, India. He is currently a software engineer with Intel Corporation's Software and Services Group. His interests include parallel programming and optimization for Multi-core and Many-core Processor Architectures.
Meghana Rao received a Master’s degree in Engineering and Technology Management from Portland State University and a Bachelor’s degree in Computer Science and Engineering from Bangalore University, India. She is a Developer Evangelist with the Software and Services Group at Intel focused on Machine Learning and Deep Learning.