Using TensorFlow* for Deep Learning Training and Testing

Introduction

In this tutorial, you learn to train and test a single-node Intel® Xeon® Scalable processor platform system using TensorFlow* framework with CIFAR-10 image recognition datasets. Use these step-by-step instructions as-is, or as the foundation for enhancements and/or modifications.

Prerequisites

HardwareSteps have been verified on Intel® Xeon® Scalable processors but should work on any latest Intel® Xeon® processor-based system. None of the software pieces used in this document were performance optimized.
SoftwareBasic Linux*, familiar with the concepts of deep learning training
  

Install TensorFlow using binary packages or from GitHub* sources. This document describes one way to successfully deploy and test on a single Intel Xeon Scalable processor system running CentOS* 7.3. Other installation methods can be found in 2,18. This document is not meant to give an elaborate description of how to reach state-of-the-art performance; rather, it’s to introduce TensorFlow and run a simple train and test using the CIFAR-10 dataset on a single-node Intel Xeon Scalable processor system.

Hardware and Software Bill of Materials

The hardware and software bill of materials used for the verified implementation recommended here is detailed in Section II. Intel® Parallel Studio XE Cluster Edition is an optional installation for single-node implementation providing most of the basic tools and libraries in one package. Starting with Intel Parallel Studio XE Cluster Edition accelerates the learning curve needed for multi-node implementation of the same training and testing, as this software is significantly instrumental on a multi-node deep learning implementation.

ItemManufacturerModel/Version
Hardware  
Intel® Server ChassisIntelR1208WT
Intel® Server BoardIntelS2600WT
2 - Intel® Xeon® Scalable processorIntelIntel Xeon® Gold 6148 processor
6 - 32 GB LRDIMM DDR4Crucial*CT32G4LFD4266
1 - Intel® SSD 1.2 TBIntelS3520
Software  
CentOS* Linux* Installation DVDCentOS7.3.1611
Intel® Parallel Studio XE Cluster EditionIntel2017.4
TensorFlow* setuptools-36.7.2-py2.py3-none-any.whl

Install the Linux* Operating System

This section requires CentOS-7-x86_64-*1611.iso. This software component can be downloaded from the CentOS website.

DVD ISO was used to implement and verify the steps in this document; you can also use Everything ISO and Minimal ISO.

Step 1. Install Linux

1. Insert the CentOS 7.3 1611 install disc/USB. Boot from the drive and select Install CentOS 7.

2. Select Date and Time.

3. If necessary, select Installation Destination.

a. Select the automatic partitioning option.

b. Click Done to return home. Accept all defaults for the partitioning wizard, if prompted.

4. Select Network and host name.

a. Enter "<hostname>" as the hostname.

i. Click Apply for the hostname to take effect.

b. Select Ethernet enp3s0f3 and click Configure to set up the external interface.

i. From the General section, check Automatically connect to this network when it’s available.

ii. Configure the external interface as necessary. Save and Exit.

c. Select the toggle to ON for the interface.

d. Select the toggle to ON for the interface.

5. Select Software Selection. In the box labeled Base Environment on the left side, select Infrastructure server.

a. Click Done to return home.

b. Wait until the Begin Installation button is available, which may take several minutes. Then click it to continue

6. While waiting for the installation to finish, set the root password.

7. Click Reboot when the installation is complete.

8. Boot from the primary device and log in as root.

Step 2. Configure YUM*

If the public network implements a proxy server for Internet access, Yellowdog Updater Modified* (YUM*) must be configured in order to use it.

  1. Open the /etc/yum.conf file for editing.
  2. Under the main section, append the following line:
    Proxy=http://<address>:<port>;
    where <address> is the address of the proxy server and <port> is the HTTP port.
  3. Save the file and Exit.

Disable updates and extras. Certain procedures in this document require packages to be built against the kernel. A future kernel update may break the compatibility of these built packages with the new kernel, so we recommend disabling repository updates and extras to provide further longevity to this document.

This document may not be used as is when CentOS updates to the next version. To use this document after such an update, it is necessary to redefine repository paths to point to CentOS 7.3 in the CentOS vault. To disable repository updates and extras: Yum-config-manager --disable updates --disable extras.

Step 3. Install EPEL

Extra Packages for Enterprise Linux (EPEL) provides 100 percent, high-quality add-on software packages for Linux distribution. To install EPEL (latest version for all packages required):

Yum –y install (download from https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm)

Step 4. Install GNU* C Compiler

Check whether the GNU Compiler Collection* is installed. It should be part of the development tools install. Verify the installation by typing:

gcc --version or whereis gcc

Step 5. Install TensorFlow*

Using virtualenv18, follow these steps to install TensorFlow:

1. Update to the latest distribution of EPEL:

yum –y install epel-release

2. To install TensorFlow, the following dependencies must be installed10:

  1. NumPy*: a numerical processing package that TensorFlow requires
  2. Devel*: this enables adding extensions to Python*
  3. Pip*: this enables installing and managing certain Python packages
  4. Wheel*: enables managing Python compressed packages in wheel formal (.whl)
  5. Atlas*: Automatically Tuned Linear Algebra Software
  6. Libffi*: Library provides Foreign Function Interface (FFI) that allows code written in one language to call code written in another language. It provides a portable, high-level programming interface to various calling conventions11

3. Install dependencies:

sudo yum -y install gcc gcc-c++ python-pip python-devel atlas atlas-devel gcc-gfortran openssl-devel libffi-devel python-numpy

4. Install virtualenv
There are various ways to install TensorFlow18. This document uses virtualenv, a tool to create isolated Python environments16.


pip install --upgrade virtualenv

5. Create a virtualenv in your target directory:


virtualenv --system-site-packages <targetDirectory>

Example: virtualenv --system-site-packages tensorflow

6. Activate your virtualenv18:


source <targetDirectory>/bin/activate

Example: source ~/tensorflow/bin/activate

7. Upgrade your packages, if needed:


pip install --upgrade numpy scipy wheel cryptography

8. Install the latest version of Python compressed TensorFlow packages:


pip install --upgrade

This document was deployed and tested using TensorFlow 0.8 wheel.

https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.8.0-cp27-none-linux_x86_64.whl

Google releases an updated version of TensorFlow on a regular cadence, so the author recommends to use the latest version of TensorFlow (TF) wheel as available.

Latest version of Intel MKL-DNN optimized Tensor wheel file can be found in the following link, under Community Supported Builds.

https://github.com/tensorflow/tensorflow

Example:

Linux CPU Example

CPU optimized TF 1.9 wheel19 file can be downloaded as follows:

Python 2.7:

pip install https://storage.googleapis.com/intel-optimized-tensorflow/tensorflow-1.9.0-cp27-cp27mu-linux_x86_64.whl

Python 3.5:

pip install https://storage.googleapis.com/intel-optimized-tensorflow/tensorflow-1.9.0-cp35-cp35m-linux_x86_64.whl

There are versions of CPU only wheel files available on TensorFlow webpage https://www.tensorflow.org/install/install_linux#InstallingVirtualenv , which can also be used. However, these may not be optimized for CPUs.

Once you’ve installed a CPU optimized version of the TensorFlow wheel , we recommend not to use --upgrade tensorflow command as this may cause TensorFlow to get upgraded to a non-CPU optimized version. 

Step 6. Train a Convolutional Neural Network (CNN)

1. Download the CIFAR103 training dataset into /tmp/ directory:
Download the cifar-10 python version from 4,8: https://www.cs.toronto.edu/~kriz/cifar.html

2. Unzip the tar file in the /tmp/ area as the python script (cifar10_train.py) looks for data in this directory:


tar –zxf <dir>/cifar-10-python.tar.gz

3. Change directory to TensorFlow:


cd tensorflow

4. Make a new directory:


mkdir git_tensorflow

5. Change directory to the one created in last step:


cd git_tensorflow

6. Download a clone of the TensorFlow repository from GitHub9:
Git clone https://github.com/tensorflow/tensorflow.git

7. If the Models folder is missing from the tensorflow/tensorflow directory, access a Git of models from:9
https://github.com/tensorflow/models.git:


cd tensorflow/tensorflow

git clone https://github.com/tensorflow/models.git

8. Upgrade TensorFlow to the latest version or errors could occur when training the model:


pip install --upgrade tensorflow

9. Change directory to CIFAR-10 dir to get the training and evaluation Python scripts14:


cd models/tutorials/image/cifar10

10. Before running the training code, check the cifar10_train.py code and change steps from 100K to 60K if needed, as well as logging frequency from 10 to whatever you prefer.

For this document, tests were done for both 100K steps and 60K steps, for a batch size of 128, and logging frequency of 10.

code line

11. Run the training Python script to train your network:


python cifar10_train.py

This will take few minutes and you will see an image similar to below:

Python code sample

Testing script and dataset terminology

In the neural network terminology:

  • One epoch = one forward pass and one backward pass of all the training examples.
  • Batch size = the number of training examples in one forward/backward pass. The higher the batch size, the more memory space required. TensorFlow pushes it all through one forward pass (in parallel) and follows with a back-propagation on the same set. This is one iteration, or step.
  • Number of iterations = number of passes, each pass using [batch size] number of examples. To be clear, one pass equals one forward pass plus one backward pass (do not count the forward pass and backward pass as two different passes).
  • Steps parameter tells TensorFlow to run X of these iterations to train the model.

Example: given 1,000 training examples, and a batch size of 500, then it will take two iterations to complete one epoch.

To learn more about the difference between epoch versus batch size versus iterations, read the article15.

In the cifar10_train.py script:

  • Batch size is set to 128. It represents the number of images to process in a batch.
  • Max step is set to 100,000. It is the number of iterations for all epochs.

    NOTE: The GitHub code has a typo; instead of 100K, the number shows 1000K. Please update before running.

  • The CIFAR-10 binary dataset in4 has 60,000 images: 50,000 images to train and 10,000 images to test. Each batch size is 128, so the number of batches needed to train is 50,000/128 ~ 391 batches for one epoch.
  • The cifar10_train.py used 256 epochs, so the number of iterations for all the epochs is ~391 x 256 ~ 100K iterations or steps.

Step 7. Evaluate the model

Use the cifar10_eval.py script8 to evaluate how well the trained model performs on a hold-out data set.:

python cifar10_eval.py

Once you reach expected accuracy, you should see a precision @ 1 = 0.862 on your screen when running the above command, it can be run while the training script is still running towards the end of the number of steps, or it can be run after the training script has finished.

Code line

Sample results

The cifar10_train.py script shows the following results:

Results of the test

A similar-looking result below was achieved with the system described in the Hardware and Software Bill or Materials Section of this document. Note that these numbers are only for educational purposes and no specific CPU optimizations were performed.

SystemStep Time (sec/batch)Accuracy
2 - Intel® Xeon® Gold processors~ 0.10585.8% at 60K steps (~2 hours)
2 - Intel Xeon Gold processors~0.10986.2% at 100K steps (~3 hours)

When you finish training and testing your CIFAR-10 dataset, the same Models directory has images for MNIST* and AlexNet* benchmarks. For additional learning, go into MNIST and AlexNet directories and try running the Python scripts to see the results.

References

1. Thoolihan, n.d. "Install TensorFlow on CentOS7," Accessed 6/25/18.

2. n.d., "Installing TensorFlow on Ubuntu*", Accessed 6/25/18.

3. n.d., "Install TensorFlow on CentOS7", Accessed 6/25/18.

4. The CIFAR-10 dataset

5. TensorFlow, MNIST and your own handwritten digits

6. TensorFlow Tutorial

7. Tutorial on CNN on TensorFlow

8. CIFAR-10 Details

9. TensorFlow Models

10. Installing TensorFlow from Sources

11. Libffi

12. Performance Guide for TensorFlow

13. What is batch size in neural network?

14. Learning Multiple Layers of Features from Tiny Images (PDF), Alex Krizhevsky, 2009

15. Epoch vs Batch Size vs Iterations

16. Virtualenv

17. CPU Optimizations

18. Download and Setup

19. Intel® Optimization for TensorFlow* Installation Guide

For more complete information about compiler optimizations, see our Optimization Notice.