AI Practitioners Guide for Beginners

TensorFlow* Framework Deployment and Example Test Runs on Intel® Xeon® Platform-Based Infrastructure

Scope

This practitioner enablement guide provides a high-level overview of business and data strategy that a machine learning (ML) practitioner needs to know, followed by a detailed walkthrough of how to install and validate one of the popular artificial intelligence (AI) frameworks, TensorFlow* on Intel® Xeon® Scalable platforms. The guide details steps for installing running popular examples three different ways: on bare metal, via containers, and on the cloud.

Note: the examples shown in this guide were not performance optimized and are for educational purpose only.

Definition of Artificial Intelligence (AI)

The definition of “artificial intelligence” is continually evolving, but at its core, AI is about machines mimicking (and/or exceeding) cognitive functions associated with the human mind. In the universe of AI, which includes many different approaches, data-centric machine learning has emerged as a leader due to its increasing ability to tackle the three main AI sub-tasks: perception, planning/reasoning, and control. Ultimately, AI is achieved through the fusion of multiple approaches to deliver ever more intelligent machines, and the nexus of AI developments in the near-future is centered on deep learning, with other approaches all playing important roles – depending on the dataset, problem, and unique requirements.

As shown in figure 1, a subset of AI umbrella is machine learning, which can be defined as machine algorithms whose performance keeps improving as they are exposed to more data over time. For example, if you were to program a self-learning robot to water your plants in the garden and that robot hits a stone on its way, it will learn to avoid the obstacle and take the optimized path in the future. Hence, the garden robot’s machine learning helps improve its performance over time.

A subset of machine learning is deep learning (DL), where multi-layered neural networks learn from vast amounts of data. Deep learning is the branch of AI that has gained huge popularity and adoption in recent years. The framework and examples provided in this guide are based on deep learning. DL comprises two major pieces, training and inference. Training teaches multi-layered neural networks (also known as models) to identify objects/text, etc. by feeding labeled data/content into it. Once the model is trained, inference begins, using the trained model to identify unlabeled content.

AI and its major subsets
Figure 1: AI and its major subsets

Business Considerations

Determine a data strategy1

The business imperative for AI is firmly rooted in data – the currency of the future. By 2020, we expect over 50 billion devices and 200 billion sensors to join the internet, and this huge explosion of smart and connected devices will lead to incalculable volumes of data being generated. In 2020, it is expected that the average internet user will generate ~1.5 GB of traffic per day (up from ~650MB in 2015). This is certainly a huge amount of data … until you consider the machines:

  • A smart hospital will generate 3,000 GB/day
  • Self-driving cars are each generating over 4,000 GB/day
  • A connected plane will generate 5,000 gigabytes per day
  • A connected factory will generate 1 million gigabytes per day

This data contains extremely valuable insights for business, operations, and security that affected industries really want to extract, analyze and interpret in real time. Extracting value from that data requires all the AI tools at our disposal.

The first step on your AI journey is to prepare your data. And for that, when thinking of an AI business model, it’s imperative to focus on the entire data lifecycle.

added AI data life cycle
Figure 2: Added AI data life cycle

Your data strategy should include a clear plan for how to create, source, transmit, ingest, clean, and integrate diverse data, followed by how to store and stage the data before processing. Each organization will have a unique data strategy, but remember the data lifecycle and focus on building an end-to-end optimized data-based solution to arrive at a unique, competitive data strategy.

Analyze the business problem you are trying to solve1

Before exploring AI, it is important to understand that implementing AI in your organization will be a journey. The first step is to define the challenges you’re facing across your organization, and prioritizing them based on business value and how much it will cost to solve them. Picture a 2x2 chart with increasing business value on the y-axis and decreasing the cost to solve on the x-axis; naturally, the most impactful challenges to tackle first are in the upper-right quadrant. The next steps are to determine which AI (or other) approach is best-suited to each problem, and then assess whether you have the expertise required to implement the solution. (Additionally, you should know whether those experts embrace a fail-fast continuous improvement philosophy, since AI projects typically involve more uncertainty, trial and error, and exploration than more traditional and deterministic software development projects.) Once the human element is in place, the next step is to source data and prepare it for analysis, as well as to stand up whatever technology infrastructure is required to tackle the problem.

Finally, you’re ready to do the heavy lifting to use data to solve business challenges – but unless your organization is ready to accept and act on data-driven insights, then all that work may have been for naught. A classic example is an initial resistance to data analytics in sports, where general managers and scouts scoffed at the idea of computer algorithms outsmarting their years of experience and tribal knowledge. Bottom line: if you think about all these steps in the AI lifecycle, you’ll stand a much better chance of realizing the business value that you originally set out to deliver through AI.

Professor Thomas Malone from MIT Sloan School of Management is founding director of the MIT Center for Collective Intelligence. In several of his works, he explores the idea of humans and machines working collectively to change the world. Understanding how this collective intelligence of humans and AI can affect your business strategy is critical. Combining a strong data strategy with a deep understanding of the business problem you’re trying to solve with AI will help you accelerate your business in the future.

TensorFlow* framework deployment and examples

This section details on the three ways to deploy a TensorFlow framework for deep learning training and inference for an Intel Xeon platform-based infrastructure.

 Single NodeMulti-node
Option 1Bare metalBare metal
Option 2Via containersX (Not covered in this document. Please check references 6, 29
Option 3On CloudOn Cloud

Option 1: Bare Metal

Single node installation:

This section details how to train and test a single-node Intel® Xeon® Scalable processor system using a TensorFlow framework with CIFAR-10 image recognition datasets. Use these step-by-step instructions as-is, or as the foundation for enhancements and/or modifications.

Knowledge prerequisites:

  • Hardware: Steps have been verified on Intel® Xeon® Scalable processors but should work on any latest Intel Xeon processor-based system. None of the software pieces used in this document were performance optimized.
  • Software: Basic Linux* and familiarity with the concepts of deep learning training

This section describes one way to successfully deploy and test an image recognition example on a single Intel Xeon Scalable processor system running CentOS* 7.3. Other installation methods can be found in Installing TensorFlow* on Ubuntu*, Intel® Optimization for TensorFlow* Installation Guide. This document used a virtual environment for installing TensorFlow. To use Anaconda*, first, refer to this article. This document is not meant to describe how to achieve state-of-the-art performance; rather, it’s to introduce TensorFlow and run a simple train and test using examples like the CIFAR-10 dataset on various Intel Xeon processor-based systems.

Hardware and software bill of materials

ItemManufacturerModel/Version
Hardware  
Intel-based server chassisIntelR1208WT
Intel-based server boardIntelS2600WT
(2x) Intel® Xeon® Scalable processorIntelIntel® Xeon® Gold 6148 processor
(6x) 32GB LRDIMM DDR4Crucial*CT32G4LFD4266
(1x) Intel® SSD 1.2TBIntelS3520
Software  
CentOS* Linux* Installation DVD 7.3.1611
Intel® Parallel Studio XE Cluster Edition 2017.4
TensorFlow* setuptools-36.7.2-py2.py3-none-any.whl
Step 1: Install the Linux* operating system

In this section, CentOS7.3.1611 was used. Download an updated version of the software from the CentOS website.

Find steps for OS installation in the Appendix.

Step 2. Configure YUM

If the public network implements a proxy server for internet access, Yellowdog Updater Modified* (YUM) must be configured in order to use it.

Open the /etc/yum.conf file for editing.

Under the main section, append the following line:

proxy=http://<address>:<port>;

where <address> is the address of the proxy server and <port> is the HTTP port.

Save the file and Exit.

Disable updates and extras. Certain procedures in this document require packages to be built against the kernel. A future kernel update may break the compatibility of these built packages with the new kernel, so disabling repository updates and extras is recommended to provide further longevity to this document.

This document may not be used “as is” when CentOS updates to the next version. To use this document after such an update, it is necessary to redefine repository paths to point to CentOS 7.3 in the CentOS vault. To disable repository updates and extras: Yum-config-manager --disable updates --disable extras.

Step 3. Install EPEL

Extra Packages for Enterprise Linux (EPEL) provides 100 percent, high-quality add-on software packages for Linux distribution. To install EPEL (latest version for all packages required):

yum –y install

(download from here)

Step 4. Install GNU* C compiler

Check whether the GNU Compiler Collection* (GCC*) is installed. It should be part of the Development Tools install in OS installation. (Look in the Appendix.) Check by typing:

gcc --version or whereis gcc

If not installed, find the latest installation here.

GCC can be installed from the official CentOS repository by using the following command:

yum –y install gcc
Step 5. Install TensorFlow*

Using virtualenv3, follow these steps to install TensorFlow:

  1. Update to the latest distribution of EPEL:

    yum –y install epel-release
  2. To install TensorFlow, the following dependencies must be installed:
    • NumPy: a numerical processing package that TensorFlow requires
    • Devel*: enables adding extensions to Python*
    • PIP*: enables installing and managing certain Python packages
    • Wheel*: enables managing Python compressed packages in wheel formal (.whl)
    • Atlas*: Automatically Tuned Linear Algebra Software
    • Libffi*: Library provides Foreign Function Interface (FFI) that allows code written in one language to call code written in another language. It provides a portable, high-level programming interface to various calling conventions8
  3. Install dependencies:

    sudo yum -y install gcc gcc-c++ python-pip python-devel atlas atlas-devel gcc-gfortran openssl-devel libffi-devel python-numpy
  4. Install virtualenv

    There are various ways to install TensorFlow. This document uses virtualenv, a tool to create isolated Python environments9.

    pip install --upgrade virtualenv
  5. Create a virtualenv in your target directory:

    virtualenv --system-site-packages <targetDirectory>

    Example:

    virtualenv --system-site-packages tensorflow
  6. Activate your virtualenv4:

    source ~/<targetdirectory>/bin/activate

    Example:

    source ~/tensorflow/bin/activate
  7. Upgrade your packages, if needed:

    pip install --upgrade numpy scipy wheel cryptography
  8. Install the latest version of Python compressed TensorFlow packages:

    pip install --upgrade

This document was deployed and tested using TensorFlow 0.8 wheel.

Google releases an updated version of TensorFlow on a regular cadence, so using the latest available version of TensorFlow wheel is recommended.

Find the latest version of Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) optimized Tensor wheel file in GitHub*, under Community Supported Builds.

Example4:

Tensor wheel file on GitHub example

Versions of CPU-only wheel files are available on TensorFlow webpage, which can also be used. However, these may not be optimized for CPUs.

After installing a version of TensorFlow wheel, you have the option to upgrade to the latest TensorFlow, but be advised that the upgraded version might not be CPU optimized.

Step 6. Train a convolutional neural network (CNN)
  1. Download the CIFAR1011 training dataset into /tmp/ directory, the Python version can be found here.
  2. Unzip the tar file in the /tmp/ area as the Python script (cifar10_train.py) looks for data in this directory:

    tar –zxf <dir>/cifar-10-python.tar.gz
  3. Change directory to TensorFlow:

    cd tensorflow
  4. Make a new directory:

    mkdir git_tensorflow
  5. Change directory to the one created in the last step:

    cd git_tensorflow
  6. Download a clone of the TensorFlow repository from GitHub.

    git clone https://github.com/tensorflow/tensorflow.git

     

  7. If the Models folder is missing from the tensorflow/tensorflow directory, access a Git of models from TensorFlow Github13:

    cd tensorflow/tensorflow
    git clone https://github.com/tensorflow/models.git

     

  8. Install TensorFlow with the latest version or errors could occur when training the model:

    pip install intel-tensorflow
  9. Change directory to CIFAR-10 dir to get the training and evaluation Python scripts12:

    cd models/tutorials/image/cifar10
  10. Before running the training code, check the cifar10_train.py code and change steps from 100K to 60K if needed, as well as logging frequency from 10 to whatever you prefer.

    For this document, tests were done for both 100K steps and 60K steps, for a batch size of 128, and logging frequency of 10.

    parser.add_argument(‘--max_steps’, type=int, default=100000, help=’Number of batches to run.’)
  11. Run the training Python script to train your network:

    python cifar10_train.py

    This will take a few minutes and you will see an image similar to below:

    train output example
Testing script and dataset terminology

In the neural network terminology:

  • One epoch = one forward pass and one backward pass of all the training examples.
  • Batch size = the number of training examples in one forward/backward pass. The higher the batch size, the more memory space required. TensorFlow pushes it all through one forward pass (in parallel) and follows with a back-propagation on the same set. This is one iteration or step.
  • Number of iterations = number of passes, each pass using [batch size] number of examples. To be clear, one pass equals one forward pass plus one backward pass (do not count the forward pass and backward pass as two different passes).
  • Steps parameter tells TensorFlow to run X of these iterations to train the model.
    Example: given 1,000 training examples, and a batch size of 500, it will take two iterations to complete one epoch.

To learn more about the differences between epoch, batch size, and iterations, read the Performance Guide for TensorFlow.

In the cifar10_train.py script:

  • Batch size is set to 128. It represents the number of images to process in a batch.
  • Max step is set to 100,000. It is the number of iterations for all epochs.

    Note: The GitHub code has a typo; instead of 100K, the number shows 1000K. Please update before running.

  • The CIFAR-10 binary dataset in Intel® Optimization for TensorFlow* Installation Guide has 60,000 images: 50,000 images to train and 10,000 images to test. Each batch size is 128, so the number of batches needed to train is 50,000/128 ~ 391 batches for one epoch.
  • The cifar10_train.py used 256 epochs, so the number of iterations for all the epochs is ~391 x 256 ~ 100K iterations or steps.
Step 7. Evaluate the model

Use the cifar10_eval.py script8 to evaluate how well the trained model performs on a hold-out dataset:

python cifar10_eval.py

Once you reach expected accuracy, you should see a precision @ 1 = 0.862 onscreen when running the above command. It can be run while the training script is still running toward the end of the number of steps, or it can be run after the training script has finished.

train script run result example

A similar-looking result below was achieved with the system described in the Hardware and Software Bill of Materials Section of this document.

Note that these numbers are only for educational purposes and no specific CPU optimizations were performed.

SystemStep Time (sec/batch)Accuracy
2 - Intel® Xeon® Gold 6148 processor~ 0.10585.8% at 60K steps (~2 hours)
2 - Intel® Xeon® Gold 6148 processor~0.10986.2% at 100K steps (~3 hours)

When you finish training and testing your CIFAR-10 dataset, the same Models directory has images for MNIST* and AlexNet benchmarks. For additional learning, go into MNIST and AlexNet directories and try running the Python scripts to see the results.

Multiple node installation:

Wei Wang and Mahmoud Abuzaina in their article have provided details on achieving performance scaling using Intel Xeon Scalable processors and Horovod* with TensorFlow. Their blog has been the source of the content for this section.

Many complex deep learning models are required to be trained on multi-node. This is because they either don’t fit in one machine or their time-to-train can be significantly reduced if they are trained on a cluster of machines. Therefore, Intel has also performed scaling studies on multi-node clusters of Intel Xeon Scalable processors. This section will provide steps to deploy TensorFlow on clusters of Intel Xeon processors using Horovod, a distributed training framework for TensorFlow.

Horovod, which was developed by Uber*, uses a message passing interface (MPI) as the main mechanism of communication. It uses MPI concepts such as allgather and allreduce to handle the cross-replicas communication and weight updates. OpenMPI* can be used with Horovod to support these concepts. Horovod is installed as a separate Python package. By calling Horovod’s API from the deep learning neural networks model script, a regular build of TensorFlow can be used to run distributed training. By using Horovod, there is no source code change required in TensorFlow to support distributed training with MPI.

Hardware and software bill of materials

ItemManufacturerModel/Version
Hardware  
Intel® Xeon® Scalable processorIntelIntel® Xeon® Gold 6148 processor
(12x) 16GB DDR4 @ 2666MT/s  
(3x) Intel® SSD 800GB, 1.6TBIntelRS3WC080
Software  
CentOS* CentOS 7.4 (Maipo)
Kernel 3.10.0-693.21.1.0.1.el7.knl1.x86_64
TensorFlow* 1.7
Step 1: Install the Linux* operating system

In this section CentOS* 7.4 was used. Download an updated version of the software from the CentOS website.

Find steps for OS installation in the Appendix.

This white paper assumes that a multiple node cluster has been set up and there is communication between the head node and compute nodes. Refer to HPC Cluster Reference Design19 if guidance is needed for cluster setup.

Step 2. Configure YUM

If the public network implements a proxy server for internet access, Yellowdog Updater Modified* (YUM) must be configured in order to use it.

Open the /etc/yum.conf file for editing.

Under the main section, append the following line:

proxy=http://<address>:<port>;

where <address> is the address of the proxy server and <port> is the HTTP port.

Save the file and exit.

Disable updates and extras. Certain procedures in this document require packages to be built against the kernel. A future kernel update may break the compatibility of these built packages with the new kernel, so disabling repository updates and extras is recommended to provide further longevity to this document.

This document may not be used “as is” when CentOS updates to the next version. To use this document after such an update, it is necessary to redefine repository paths to point to CentOS 7.4 in the CentOS vault. To disable repository updates and extras: Yum-config-manager --disable updates --disable extras.

Step 3. Install EPEL

Extra Packages for Enterprise Linux (EPEL) provides 100 percent, high-quality add-on software packages for Linux distribution. To install EPEL (latest version for all packages required):

Download:

yum –y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
Step 4. Install GNU* C compiler

Check whether the GNU Compiler Collection* (GCC*) is installed. Should be part of the Development Tools install in OS installation (Check Appendix). You can check by typing:

gcc --version or whereis gcc

If not installed, find the latest installation here.

GCC can be installed from the official CentOS* repository by using the following command:

yum –y install gcc
Step 4. Install OpenMPI

OpenMPI can be installed via yum on recent versions of CentOS. Some existing clusters already have available OpenMPI. In this section, we will use OpenMPI 3.0.0. OpenMPI can be installed following instructions in this link.

Example installation of rpm file after download:

yum localinstall openmpi-3.0.0-1.src.rpm
Step 5. Python installation

Make sure Python* 2.7 or Python* 3.6 are installed and tested. As part of the OS installation, necessary packages must have been installed. Update all necessary packages as follows:

sudo yum update
sudo yum install yum-utils
sudo yum groupinstall development

Proceed with installing Python. This section provides steps to install Python 3.6.1. The standard yum repositories don’t provide the latest Python release, so an additional repository in-line with upstream stable (IUM) is needed, as it provides necessary RPM packages.

sudo yum install https://centos7.iuscommunity.org/ius-release.rpm

sudo yum install python36u

Check the version of Python 3 by typing:

python3.6 –V

python –V will return the system Python version

To manage Python packages, install pip and needed development packages, if not already there.

sudo yum install python36u-pip
sudo yum install python36u-devel
Step 6. Horovod* installation

Uber Horovod supports running TensorFlow in a distributed fashion. Install Horovod as a standalone Python package as follows:

pip install –no-cache-dir horovod (e.g. horovod-0.11.3)

Please check the following link to install Horovod from this source:

Step 7. Get the latest benchmarks

The current TensorFlow benchmarks are recently modified to use Horovod. Obtain the benchmark code from GitHub:
 

git clone https://github.com/tensorflow/benchmarks
cd benchmarks/scripts/tf_cnn_benchmarks

Run tf_cnn_benchmarks.py as explained below.

Step 8: Running TensorFlow* benchmark using Horovod*

This section discusses run commands needed to run distributed TensorFlow using Horovod framework.

Running 2 MPI processes on single node:

export LD_LIBRARY_PATH=<path to OpenMP lib>:$LD_LIBRARY_PATH
export PATH=<path to OpenMPI bin>:$PATH
export inter_op=2
export intra_op=18 {# cores per socket}
export batch_size=64
export MODEL=resnet50 {or inception3}
export python_script= {path for tf_cnn_benchmark.py script}

mpirun -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -cpus-per-proc 20 --map-by socket  --overscribe
--report-bindings -n 2 python $python_script --mkl=True --forward_only=False --num_batches=200
--kmp_blocktime=0 --num_warmup_batches=50 --num_inter_threads=$inter_op --distortions=False
--optimizer=sgd --batch_size=$batch_size --num_intra_threads=$intra_op --data_format=NCHW
--model=$MODEL --variable_update horovod --horovod_device cpu --data_dir <path-to-real-dataset>
--data_name <dataset_name>

For 1 MPI process per node, the configuration will be as follows; other environment variables will remain the same.

export intra_op=38
export batch_size=128

mpirun -x LD_LIBRARY_PATH -x OMP_NUM_THREADS --bind-to none --report-bindings
-n 1 python $python_script --mkl=True --forward_only=False --num_batches=200
--kmp_blocktime=0 --num_warmup_batches=50 --num_inter_threads=$inter_op
--distortions=False --optimizer=sgd --batch_size=$batch_size
--num_intra_threads=$intra_op --data_format=NCHW --model=$MODEL
--variable_update horovod --horovod_device cpu --data_dir <path-to-real-dataset>
--data_name <dataset_name>

Note: to train models to achieve good accuracy, use –distortions=True. You may also need to change other hyper-parameters.

For running models on a multi-node cluster, use a similar run script as the one above. For example, to run on 64-node (2 MPI per node), where each node is an Intel Xeon Gold 6148 processor, the distributed training can be launched as shown below. All the export lists will be the same as above.

mpirun -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -cpus-per-proc 20 --map-by node
--report-bindings -hostfile host_names -n 128 python $python_script --mkl=True
--forward_only=False --num_batches=200 --kmp_blocktime=0 --num_warmup_batches=50
--num_inter_threads=$inter_op --distortions=False --optimizer=sgd --batch_size=$batch_size
 --num_intra_threads=$intra_op --data_format=NCHW --model=$MODEL --variable_update horovod
--horovod_device cpu --data_dir <path-to-real-dataset> --data_name <dataset_name>

Here, the host_names file is the list of hosts on which you wish to run the workload.

What distributed TensorFlow* means for DL training on Intel® Xeon® processors

Various efforts were taken to implement distributed TensorFlow on CPU and GPU. For example, gRPC, VERBS, TensorFlow built-in MPI. All of these technologies are incorporated within TensorFlow codebase. Uber Horovod is one distributed TensorFlow technology that was able to harness the power of Intel Xeon processors. It uses MPI underneath, and uses Ring based reduction and gather for deep learning parameters. As shown in the blog by Wang and Abuzaina, Horovod on Intel Xeon processors shows great scaling for existing DL benchmark models, such as Resnet 50 (up to 94%) and Inception v3 (up to 89%) for 64 nodes. In other words, time to train a DL network can be accelerated by as much as 57x (resnet 50) and 58x (inception V3) using 64 Intel Xeon processor nodes compared to a single such node. Currently, Intel recommends that TensorFlow users specify Intel-optimized TensorFlow and Horovod MPI for multi-node training on Intel Xeon Scalable processors.

Option 2: Using AI Containers

Single node installation:

Hardware and software bill of materials

ItemManufacturerModel/Version
Hardware  
(2x) Intel® Xeon® Scalable processorIntelIntel® Xeon® Platinum 8164 processor
(12x) 32GB DDR4 @ 2666MT/s  
(3x) Intel® SSD 800GB, 1.6TBIntelRS3WC080
Software  
CentOS CentOS 7.5
Kernel 3.10.0-862.el7.x86_64
TensorFlow* 1.9
Step 1: Install the Linux* operating system

In this section, CentOS* 7.4 was used. Download an updated version of the software from the CentOS website.

Find steps for OS installation in the Appendix.

Step 2. Configure YUM

If the public network implements a proxy server for internet access, Yellowdog Updater Modified* (YUM) must be configured in order to use it.

Open the /etc/yum.conf file for editing.

Under the main section, append the following line:

proxy=http://<address>:<port>;

where <address> is the address of the proxy server and <port> is the HTTP port.

Save the file and Exit.

Disable updates and extras. Certain procedures in this document require packages to be built against the kernel. A future kernel update may break the compatibility of these built packages with the new kernel, so disabling repository updates and extras is recommended to provide further longevity to this document.

This document may not be used “as is” when CentOS updates to the next version. To use this document after such an update, it is necessary to redefine repository paths to point to CentOS 7.4 in the CentOS vault. To disable repository updates and extras: Yum-config-manager --disable updates --disable extras.

Step 3. Install EPEL

Extra Packages for Enterprise Linux (EPEL) provides 100 percent, high-quality add-on software packages for Linux distribution. To install EPEL (latest version for all packages required):

yum –y install (download from here)

Step 4. Install GNU* C compiler

Check whether the GNU Compiler Collection* (GCC*) is installed. Should be part of the Development Tools install in OS installation (Check Appendix). You can check by typing:

gcc --version or whereis gcc

If not installed, find the latest installation here.

Install GCC from the official CentOS* repository by using the following command:

yum –y install gcc
Step 5. Download and install Anaconda*

Follow the instructions on the Anaconda* download site to download and install Anaconda.

Download the source file for Anaconda for Python* 2.7

(Python* 2.7 is recommended as currently TensorFlow is only supported for Python* 2.7 or Python* 3.5. This section uses Python* 2.7)

Install Anaconda by using following command

bash Anaconda-latest-Linux-x86_64.sh

Follow the prompts on the screen to complete the installation.

Note: You will need to open a new terminal to for the Anaconda installation to become active.

Step 6. Install the latest Intel® Optimization for TensorFlow* from Anaconda

Open the Anaconda prompt using the following instruction:

conda install tensorflow

Follow the prompts onscreen to complete downloading and extracting the packages.

Expect to see a screen similar to the following:

conda install tensorflow result example

If your anaconda channel is not the highest priority channel by default (or if you are unsure), use the following command to get the correct Intel® Optimization for TensorFlow*:

conda install –c anaconda tensorflow

Expect to see a screen similar to the following:

conda install –c anaconda tensorflow result example

Besides the install method described above, Intel Optimization for TensorFlow is distributed as wheels, docker images and conda package on the web page Intel channel. This section will cover installing Intel Optimization for TensorFlow using docker images.

Step 7. Install Docker*

Install Docker* on your system; or, skip to step 8 if Docker is already installed.

Install Docker on CentOS

yum install docker

Once complete, expect to see a screen similar to the following:

yum install docker result example

Install epel repositories, which must be enabled on your system.

yum install epel-release
yum install docker-io

After the Docker package has been installed, start the daemon. Enable it system-wide and check its status by using the following commands:

systemctl start docker
systemctl status docker
systemctl enable docker
systemctl enable docker result example

Finally, run a container test image to verify that Docker works properly, using the following command:

docker run hello-world

If Docker is working properly, expect to see something like the following:

docker run hello-world result example

Note: In case of issues with Docker connection timeout, and you are behind a proxy server (for example in a corporate setting), you may need to add certain configurations in the Docker system service file.

Step 8. Install the latest Intel® Optimization for TensorFlow* Docker images into an existing Python* installation

These Docker images are all published at dockerhub in the intelaipg/intel-optimized-tensorFlow namespace and can be pulled with the following command:

docker pull docker.io/intelaipg/intel-optimized-tensorflow:<tag>

Example:

docker pull docker.io/intelaipg/intel-optimized-tensorflow:latest-devel-mkl

Available container configurations and tags can be found here.

Once the Docker pull is complete, expect to see a screen similar to the following:

docker pull complete example

To see the list of all available Docker images on your system, type following command:

docker images
docker images result example

Now you can run an example of Python* 2.7 data science container and have it open in Jupyter* Notebook by typing:

docker run -it -p 8888:8888 intelaipg/intel-optimized-tensorflow

Go to your browser on http://localhost:8888/

docker run 8888 result example

The ‘latest’ and all other tags that don’t have ‘devel’ in them don’t open an interactive terminal by default.

You can force open an interactive terminal by adding ‘/bin/bash’ to the end of the docker run command. For example:

docker run -ti intelaipg/intel-optimized-tensorflow:latest /bin/bash
force open terminal result example
Step 9. Get the latest benchmarks

Obtain the current TensorFlow benchmarks code from GitHub:

git clone https://github.com/tensorflow/benchmarks
cd benchmarks/scripts/tf_cnn_benchmarks
Run tf_cnn_benchmarks.py as explained below.

Note: The container is a light image; you will have to install basic Linux packages like yum, wget, vi, etc. on to the Docker container image. The authors ran the following steps before cloning the benchmark. The container image is based on Ubuntu.

apt-get update
apt-get install vim –y
apt-get install yum
apt-get install git

After making the necessary updates/changes to your container, exit and save the changes on the local version of the image by using:

docker commit <container_ID> <name_you_like>

Container_ID here is the ID provided when you initially ran the container.

Example:

docker commit container id result example
Step 10: Running TensorFlow* benchmark

This section covers run commands needed to run TensorFlow CNN benchmarks.

cd benchmarks/scripts/tf_cnn_benchmarks
python tf_cnn_benchmarks.py --forward_only=True --device=cpu --mkl=True --kmp_blocktime=0 --nodistortions --batch_size=32 --model=inception3 --data_format=NCHW  --num_intra_threads=4  --num_inter_threads=1

Use the commands given in the TensorFlow* Performance Guide to learn how to get the best CPU optimized numbers. The example command above is for inceptionV3 model, but other models within the tf_cnn_benchmark directory can also be used.

Multiple node installation:

At the time this document was developed, no Docker* containers optimized for Intel® technology are available for multiple nodes, so no tested instructions for that are included. However, published documents such as Horovod distributed training on Kubernetes using MLT are recommended reading for distributed training on Kubernetes. Additionally, to deploy TensorFlow via containers on multiple nodes, review the whitepaper Best known methods for scaling deep learning with TensorFlow* on Intel® Xeon® processor-based clusters as well as Nauta. This document provides steps to create singularity containers using a multimode environment, and then deploy/run those singularity containers.

Option 3: AI in Cloud

Single node installation:

Various cloud service providers (CSPs) can be used to deploy AI workloads via the cloud. This document cites a few of the major CSPs as examples for deploying AI in the cloud.

This example is for Amazon Web Services (AWS)* but you can use a CSP of your choice.

Source: AWS Deep Learning Tutorial

Step1: Sign in to AWS management console

Sign in to your AWS Management console with your username and password. Then select EC2 instance.

Step 2: Configure your instance
  1. Choose EC2 instance and click Launch Instance
  2. Select an AWS deep learning AMI
    As mentioned in TensorFlow* Performance Guide, Amazon Machine Instance (AMI) is available for both Ubuntu and Amazon Linux. Choose the ideal fit for your application. This guide selected Deep Learning AMI (Ubuntu).

    selected Deep Learning AMI screenshot
  3. Choose the instance type for deep learning and deployment needs. Select Compute Optimized Instance (c5), to get CPU optimized hardware and software. Then click Review and Launch.

    Example:

    selected Compute Optimized Instance screenshot
  4. Choose Launch on the review page

    choose Launch screenshot
  5. Create a private key file by selecting Create New Key Pair, and download it to a safe location. Then launch the instance. You will see screen shot as follows:

    create a private key file screenshot

    Note: You may get a message saying “your account is currently being verified.” Verification usually takes less than two hours. You can retry to launch the instance after 30 minutes.

  6. Click View Instance to see your instance.

    Example:

    click view instancee screenshot
  7. After clicking View Instance, find and copy your instance’s public DNS.

    Example screen shot:

    find and copy public DNS screenshot
Step 3: Connect to your instance

To start using the command line terminal to communicate with the instance on AWS using Windows*, use a command prompt or download Git for Windows*.

  1. Following the steps described in [24], open the command terminal
  2. Change to the directory where your security key is located
  3. Change the permissions on the key pair file
  4. Connect to your instance using SSH

Example:

cd /user/xzy/Downloads/ 
chmod 0400 <key_file_name.per> 
ssh –l localhost:8888:localhost:8888 –i <key_file_name.per> ubuntu@<your_instance_DNS_that_you_copied>

Note: If you are on a corporate network, it may be necessary to use a proxy to connect to your instance.

Using Putty:

  1. Download and install PuTTY from the PuTTY download page
  2. PuTTY natively doesn’t support .pem file generated by the AWS EC2, so convert the .pem key file to the required PuTTY format (.ppk).
  3. Convert your private key:
    1. Launch Puttygen
    2. Under type of key to generate, choose RSA. When using older versions of PuTTYgen, choose SSH-2 RSA

      choose SSH-2 RSA screenshot
    3. Choose Load. By default, PuTTY will list only .ppk files. Select All Files from the dropdown menu to view .pem file

      select All Files screenshot
    4. Select the .pem file for the AWS key pair you specified when you launched your instance, then choose Open. Choose OK to dismiss the dialog box.
    5. Select Save Private Key to save the key in a format PuTTY can use. PuTTYgen displays a warning about saving the key without a passphrase. Choose Yes.
    6. Specify the same name for the key you used for the key pair. PuTTY automatically adds the .ppk file extension.
  4. Start your PuTTY session:
    1. Launch PuTTY
    2. Click on Session and in hostname add the IPv4 Public IP address of your EC2 instance.

      Example:

      add the IPv4 Public IP address screenshot
    3. Now click on SSH under connection, expand it, and click on Auth. In Auth, add the link to your .ppk file created from the .pem file. Example:

      add link to ppk file screenshot
    4. If you are on a corporate network, click on Proxy, Select Socks5, enter the name of your proxy network, add port 1080

      Example:

      set proxy screenshot
    5. Save the session under a name, if you like, under the Session tab
    6. Once saved, click Open
    7. For Login type Ubuntu
    8. The session should start successfully like example snapshot below:

      session start example
Step 4: Run your deep learning framework

Run deep learning workload using Jupyter* or directly on the command terminal.

For Jupyter: type jupyter notebook

Copy the URL indicated to access your notebook and start using a deep learning framework.

For terminal:

This whitepaper uses containers to install the Intel Optimization for TensorFlow framework and run benchmark examples. The AWS instance running should already have Docker installed.

  1. Type the following command to check if Docker is installed.

    apt-cache policy docker-ce

    You should see Docker Installed. If it says Installed: (none), install Docker on your instance. To install Docker on Ubuntu, follow the steps here or on a similar document.

    docker installed example
  2. To see Docker running, type:

    sudo systemctl status docker
    docker status example

    Run Docker hello-world to check if Docker is installed correctly.

  3. Pull the Intel Optimization for TensorFlow Docker container using the command below. Follow steps under option 2 to view details.

    docker pull docker.io/intelaipg/intel-optimized-tensorflow:latest-devel-mkl

    To see a list of all available Docker images on your system, type following command:

    docker images
    docker images example
  4. Running Intel Optimization for TensorFlow Docker image.

    The ‘latest’ and all tags that don’t have ‘devel’ in them don’t open an interactive terminal by default.

For Jupyter notebook, run following command:

docker run -it -p 8888:8888 intelaipg/intel-optimized-tensorflow

Go to your browser on http://localhost:8888/

For command terminal, run following command:

docker run -ti intelaipg/intel-optimized-tensorflow:latest /bin/bash

The latest tag will automatically load the latest version of the optimized container.

docker run example
Step 5: Running TensorFlow benchmark

The TensorFlow container you pulled is a light image; install basic Linux packages such as yum, wget, vi etc. onto the Docker container image before running benchmarks. The author ran the following steps before cloning the benchmark. The container image is based on Ubuntu.

apt-get update
apt-get install vim –y
apt-get install yum
apt-get install git

Once the necessary updates/changes to the container are made, exit and save the changes on the local version of the image by using:

root@<container_ID>:/notebooks# exit
root@<container_ID>:/notebooks# docker commit <container_ID> <name_you_like>

container_ID here is the ID given to you when you initially ran the container.

Example:

save container id example

Now run the new Docker image:

docker run –ti <new_name_you_gave_to_container_saved_above>:latest /bin/bash

Example:

run latest docker example

Obtain the current TensorFlow benchmarks code from GitHub:

git clone https://github.com/tensorflow/benchmarks
cd benchmarks/scripts/tf_cnn_benchmarks

This section discusses commands needed to run TensorFlow CNN benchmarks:

python tf_cnn_benchmarks.py --forward_only=True --device=cpu --mkl=True --kmp_blocktime=0 --nodistortions --batch_size=32 --model=inception3 --data_format=NCHW  --num_intra_threads=4  --num_inter_threads=1

The commands given here provide exact details for achieving the best CPU optimized numbers. The example command above is for inceptionV3 model, but other models within the tf_cnn_benchmark directory can also be used.

Here’s an example screenshot after running the command above:

run TensorFlow CNN benchmarks result example

Note: When running the Python command and this error message appears: “No module experimental.ops,” it probably means the version of container TensorFlow is not compatible with the version used for benchmarks.

Example: if benchmarks are compatible with TensorFlow v1.12 and your container is for TensorFlow v1.10, download the compatible benchmark using the following command:

git clone –b cnn_tf_v1.10_compatible https://github.com/tensorflow/benchmarks
Step 6: Terminate your instance

Multiple node installation:

This section describes steps for running distributed TensorFlow using Google Cloud Platform* with Google Machine Learning (ML) Engine. It details how to get an instance running on Google ML Engine, deploy TensorFlow, and run an example training on an all-CPU, multi-node setting, which is assigned by selecting standard_1 tier when running the distributed training. The majority of steps described in this section are from the Google ML Engine getting started guide.

Using Google Cloud Platform* (GCP)

Source: TensorFlow Getting Started

Step 1: Sign in to your Google account.
Step 2: Select or create a GCP project via “go to the manage resources page.”
Step 3: Enable billing for your project by following these instructions.
Step 4: Enable the Cloud Machine Learning Engine and Compute Engine APIs by clicking this link:

Expect to see a screen like below. Select your project and click “Continue.” Enabling APIs takes a few minutes.

register your application screenshot

Once enabling APIs is complete, expect to see a screen like the following:

APIs are enabled screenshot
Step 5: Set up authentication
  1. In the GCP console, go to the Create service account key page.
  2. From the service account dropdown list, select New Service Account.
  3. In the Service Account Name field, enter a name.
  4. From the Role drop-down list, select Project > Owner.
  5. Click Create. A JSON file that contains your key downloads to your computer.
Step 6: Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the file path of the JSON file that contains your service account key. This variable only applies to your current shell session; if you open a new session, set the variable again.

For Windows, using command prompt will be something like the following:

set GOOGLE_APPLICATION_CREDENTIALS= "<PATH>"
set Google app credentials example
Step 7: Install and initialize the cloud SDK per instructions here.
  1. Download the Cloud SDK Installer, making sure to sign in with correct Google credentials.

    Note: If you get a Download Failed: connecting to host message and you are possibly behind a corporate firewall, follow the steps here to install from an archived version if you are unable to get around the firewall.

  2. Launch the installer and follow the instructions on prompt.
  3. After installation is completed, accept the following options:
    • Start cloud SDK shell. (For Windows* users, if it hasn’t started by default, find and click on the installed SDK.) It should open a command prompt as shown in the following:

      Google cloud SDK shell command prompt screenshot
    • Run gcloud init (For Windows, type gcloud init in the terminal window as follows)

      gcload init run result example

      Once authentication is complete, expect to see a web browser page open and display a screen like the following:

      authentication complete screenshot

      Note: If you have a network proxy, set it up as part of the glcoud setup using the correct HTTPS_PROXY and HTTP_PROXY address

      Example:

      set proxy example
  4. After setting a cloud project, the system will ask to set a specific time zone. Selecting this time zone is critical because later during distributed training on the cloud, you will have to create a region for your cloud storage bucket, which should match the time zone you select for running your ML Cloud Engine.

    set time zone example
  5. Once complete, expect to see a message that your Google Cloud SDK is configured and ready to use.

    Example:

    time zone complete example
Step 8: Setting up the environment

Instructions in this section are taken from the Google Cloud AI and Machine Learning Products page. Google has both MAC OS and Cloud Shell (for macOS*, Linux, and Windows) instructions to set up your environment locally. However, this section covers only the steps from Cloud Shell as done on a Windows machine.

  1. Open the Google Cloud Platform Console

    A web browser page like the following screenshot should open. You can select your respective project from the dropdown menu on the top left.

    google cloud platform console screenshot
  2. Click the Activate Google Cloud Shell button at the top of the console window.

    Activate Google Cloud Shell button screenshot

    A window similar to the following screenshot opens. Click on Start Cloud Shell to begin.

    Google Cloud Shell screenshot

    Wait for the cloud shell machine to start.

    Google cloud shell starting screenshot

    A cloud shell session should open inside a new frame at the bottom of the Google Cloud Platform page. It will look like the following:

    cloud shell session in a new frame screenshot
  3. If you did not select your project ID in Google Cloud Platform page already, then you can set it up now or change to a different project ID using the command:

    gcloud config set project <project-ID>
Step 9: Verify the google cloud SDK components
  1. List your models:

    gcloud ml-engine models list

    If you have not created any models before, the command returns an empty list.

    return empty list example
  2. If you have installed gcloud previously, update gcloud:

    gcloud components update

Install TensorFlow*

To install TensorFlow, run the following command:

pip install –user –upgrade tensorflow
install TensorFlow example

By default, TensorFlow will probably be installed so you will see a message about requirement already up to date, with the version of TensorFlow.

Run a simple TensorFlow Python program (Optional)

Try to run the following basic Python program to gain confidence in writing and testing your installation of TensorFlow.

import tensorflow as tf
hello= tf.constant(‘Hello, Tensorflow!’)
sess = tf.Session()
print(sess.run(hello))

If successful, the system outputs:

Hello, Tensorflow!
>>> exit()

Your test should resemble the following screenshot:

test output example

As stated on the Google Cloud AI and ML Products page, from where these instructions are taken, the Cloud ML Engine runs Python 2.7 by default and the sample code for this section uses Python* 2.7. You can check Google Cloud AI and ML Products page to further learn how to use Python 3.5 for submitting jobs.

Step 10: Download the code for the example

This document shows steps for Cloud Shell running on Windows OS. Good AI and ML Learning Products: Getting Started Google Cloud AI and ML Products page, provides steps for MacOS as well as Cloud Shell.

  1. Download the sample zip file from the GitHub Repository:

    Unzip the sample zip file to extract the cloudml-samples-master directory:

    unzip master.zip

    Example screenshot:

    unzip master.zip example
  2. Navigate to the cloudml-samples-master → census → estimator directory. The commands in the section of the guide must be run from the estimator directory.

    cd cloudml-samples-master/census/estimator
Step 11: Get training data

Google hosts a public Cloud Storage bucket, where the relevant data files, adult.data, and adult.test exist for this section.

  1. Create a data directory and download the data to the estimator directory:

    mkdir data
    gsutil –m cp gs://cloud-samples-data/ml-engine/census/data/* data/
  2. Set the TRAIN_DATA and EVAL_DATA variable to the file paths.

    Example:

    TRAIN_DATA=<local path>/data/adult.data.csv
    EVAL_DATA=<local path=>/data/adult.test.csv
Step 12: Install dependencies

TensorFlow is installed on Cloud Shell, but the sample in this section is based on TensorFlow 1.10. Hence, you must run the sample’s requirements.txt file to ensure you are using the same version of TensorFlow and other dependencies as required by the sample in this section.

You should be in the clouldml-samples-master/census/estimator directory.

You will find requirements.txt file one directory above.

find requirements.txt example

Now run:

pip install --user –r ../requirements.txt
Step 13: Set up your cloud storage bucket
  1. Specify a name for your new bucket. The name must be unique across all buckets in Cloud Storage.

    BUCKET_NAME =”<your bucket name>”

    Note: To be sure your bucket name is unique, it’s recommended to use your project name with –mlengine, as shown in the command below.

    PROJECT_ID=$(gcloud config list project –format “value(core.project)”)
    BUCKET_NAME=${PROJECT_ID}-mlengine
  2. Check the name for the bucket you created:

    echo $BUCKET_NAME

    Example screenshot:

    check bucket name example
  3. 3. Select a region for your bucket and set the environment variable:

    REGION=<name of region>

    Example:

    REGION=us-west1

    Note 1: Specify a unique region for your bucket; it cannot have a multi-region location. Find an available region at Google Cloud AI and ML Products page for Cloud ML Engine.

    Note 2: Use the same region where you plan on running Cloud ML Engine jobs – the region you chose in step 7 of “using Google Cloud Platform” sub-section.

    Note 3: If you restart your session, you may have lost your environment settings for BUCKET_NAME and REGION. It is recommended to check the variable setting before going to the next step, especially doing a restart of the cloud shell.

  4. Create the new bucket:

    gsutil mb –l $REGION gs://$BUCKET_NAME

    Example screenshot:

    create new bucket example
Step 14: Upload the data files to your cloud storage bucket
  1. Use gsutil to copy the two files to your newly created cloud storage bucket:

    gsutil cp –r data gs://$BUCKET_NAME/data
  2. Point the TRAIN_DATA and EVAL_DATA variables to the file location in your cloud storage bucket:

    TRAIN_DATA=gs://$BUCKET_NAME/data/adult.data.csv
    EVAL_DATA=gs://$BUCKET_NAME/data/adult.test.csv
  3. Copy the JSON test file to your cloud storage bucket using gsutil:

    gsutil cp ../test.json gs://$BUCKET_NAME/data/test.json
  4. Set the TEST_JSON to point to the file:

    TEST_JSON=gs://$BUCKET_NAME/data/test.json

    Here’s an example screen capture of uploading data files to your cloud storage bucket:

    uploading to cloud bucket example
Step 15: Run distributed training in the cloud

To run your training job in distributed mode in Google cloud, the commands are very similar to those used to run training on a single instance on Google cloud. This document didn’t cover single instance training on Google cloud, but it can be found on the Google Cloud AI and ML Products page. The major difference for training in single instance vs. multiple is setting – scale-tier to correct tier compared to basic for a single instance. Find information on available scale tiers for Google Cloud ML Engine here.

  1. Set a name for your distributed training job:

    JOB_NAME=<your distributed training name>

    Example:

    JOB_NAME=census_dist_rev1
  2. Create an OUTPUT_PATH. We recommend adding JOB_NAME to avoid reusing checkpoints between jobs.

    Note You might have to redefine BUCKET_NAME if you’ve started a new shell session. Run echo $BUCKET_NAME to make sure variables are set correctly.

    OUTPUT_PATH=gs://$BUCKET_NAME/$JOB_NAME

    Example:

    set output path example
  3. Run the following command to submit a distributed training job in the Google Cloud that uses multiple workers. This example uses standard_1 scale tier to use an all-CPU-based configuration. The job can take a few minutes to start.

    Place the --scale-tier above the -- that separates the user arguments from the command line arguments.

    Example31

    gcloud ml-engine jobs submit training $JOB_NAME \
    		    --job-dir $OUTPUT_PATH \
    		    --runtime-version 1.10 \
    		    --module-name trainer.task \
    		    --package-path trainer/ \
    		    --region $REGION \
    		    --scale-tier STANDARD_1 \
    		    -- \
    		    --train-files $TRAIN_DATA \
    		    --eval-files $EVAL_DATA \
    		    --train-steps 1000 \
    		    --verbosity DEBUG  \
    		    --eval-steps 100

    Once the job is submitted correctly, expect to see a message similar to the following:

    job successful example

    Monitor job progress by watching the command-line output or in ML Engine > Jobs on the Google Cloud Platform console:

    GCP ML Engine Jobs screenshot

    Once complete, expect to see a screen like the following:

    job complete GCP screenshot

    View job status on cloud shell with the command:

    gcloud ml-engine jobs describe census_dist_rev1

    This generates an output similar to the following screen shot:

    job status output example

    Once the training has completed, the state variable will show "SUCCEEDED

Step 16: Inspect the logs

There are two ways to inspect the logs generated from the distributed training:

Either go to GCP Console ML Engine > Jobs and click View Logs, or use the following command on your cloud shell terminal:

gcloud ml-engine jobs stream-logs $JOB_NAME

Hyperparameter Tuning (Optional)

Hyperparameter tuning helps maximize the model’s predictive accuracy. The census example used in this section stores the hyperparameter configuration settings in a YAML file named hptuning_config.yaml. Use this as a template for your specific model and training.

  1. Select a new job name and create variables that reference the configuration:

    HPTUNING_CONFIG=../hptuning_config.yaml
    JOB_NAME=cesus_dist_hptune_rev1
    echo $BUCKET_NAME
    TRAIN_DATA=gs://$BUCKET_NAME/data/adult.data.csv
    EVAL_DATA=gs://$BUCKET_NAME/data/adult.test.csv

    Example screenshot:

    new job config var example
  2. Set the OUTPUT_PATH as done above. Make sure BUCKET_NAME is defined and you are not inadvertently using checkpoints between jobs.

    OUTPUT_PATH=gs://$BUCKET_NAME/$JOB_NAME
  3. Run the following command to submit a training job that uses hyperparameter tuning on multiple nodes:

    gcloud ml-engine jobs submit training $JOB_NAME \
    	    --stream-logs \
    	    --job-dir $OUTPUT_PATH \
    	    --runtime-version 1.10 \
    	    --config $HPTUNING_CONFIG \
    	    --module-name trainer.task \
    	    --package-path trainer/ \
    	    --region $REGION \
    	    --scale-tier STANDARD_1 \
    	    -- \
    	    --train-files $TRAIN_DATA \
    	    --eval-files $EVAL_DATA \
    	    --train-steps 1000 \
    	    --verbosity DEBUG  \
    	    --eval-steps 100
  4. View the results in GCP console ML Engine > Jobs, similar to what was done before.
Step 17: Deploy a model to support prediction (INFERENCE)
  1. Similar to selecting job name at this step, select a model name. The name must start with a letter and can only contain letters, number, and underscores.

    Example:

    MODEL_NAME=census_rev1
  2. Create a cloud ML engine model:

    gcloud ml-engine models create $MODEL_NAME --regions=$REGION
  3. Create the output path to use. For example, using census_dist_rev1 as the job name, which is the same as was created in the distributed non-hyperpararmeter tuning sub-section:

    OUTPUT_PATH=gs://$BUCKET_NAME/census_dist_rev1
  4. Find the full path of the exported trained model libraries:

    gsutil ls –r $OUTPUT_PATH/export

    Example screenshot:

    find lib full path example1find lib full path example2
  5. In the step above, the screen will show all the directories under $BUCKET_NAME/export. Locate directory named $OUTPUT_PATH/export/census/<timestamp> and copy this directory path (with the colon “:” at the end) and set that to the MODEL_BINARIES label. See the screenshot above with red highlight as an example.

    MODEL_BINARIES=gs://$BUCKET_NAME/census_dist_rev1/export/census/1544207287/
  6. Run the following command to create a version rev1:

    gcloud ml-engine versions create rev1 --model $MODEL_NAME –origin $MODEL_BINARIES --runtime-version 1.10

    This takes a few minutes. Once complete, expect to see a screen message as shown in the example below.

    create version rev complete example
  7. Obtain a list of models using the list command:

    gcloud ml-engine models list

    Output result will look like the example below:

    obtain model list output example
Step 18: Send an online prediction request to a deployed model

Now that you’ve deployed your model, you can send it prediction requests. In the example below, the following command sends an online prediction request using a test.json file that was downloaded as part of the steps above.

gcloud ml-engine predict --model $MODEL_NAME --version v1 --json-instances ../test.json

This command will result in an output as follows:

send prediction requests output example

The result indicates whether the predicted income is greater than or less than $50k.

Find details for submitting a batch prediction job in Google Cloud AI and ML Products page. Batch prediction is useful for handling large amounts of data and no latency requirements on receiving prediction results. A run for batch prediction uses the same format as an online prediction, but it uses Cloud Storage for data.

A batch prediction is slower for a small number of instances as it’s more suitable for larger data.

Step 19: Cleanup

After analyzing the output from the training and inference run, use the following the command to clean up cloud storage, to avoid incurring additional GCP charges.

gsutil rm -r gs://$BUCKET_NAME/$JOB_NAME

To learn how to deploy more workloads, Google’s AI and Machine Learning Products page has a list of more samples and tutorials.

Conclusion

The goal of this document was to provide beginner level AI practitioners a glimpse of business considerations for AI, with a detailed list of steps to get started deploying TensorFlow on the Intel Xeon platform and run sample training and inference jobs.

Intel, and its many ecosystem partners, can be depended on to provide developer resources to help you get started on your AI journey. Visit Intel® AI to learn more about Intel’s rich AI offerings, as well as Intel® AI Builders for an extensive list of AI builder partners, AI blogs, solutions, reference designs, and testimonials.

Appendix

Install the Linux* Operating System

This section requires CentOS-7-x86_64-*1611.iso. This software component can be downloaded from the CentOS website.

DVD ISO was used to implement and verify the steps in this document; you can also use Everything ISO and Minimal ISO.

Steps to install Linux
  1. Insert the CentOS 7.3 1611 install disc/USB. Boot from the drive and select Install CentOS 7.
  2. Select Date and Time.
  3. If necessary, select Installation Destination.
    1. Select the automatic partitioning option.
    2. Click Done to return home. Accept all defaults for the partitioning wizard, if prompted.
  4. Select Network and hostname.
    1. Enter "<hostname>" as the hostname.
      1. Click Apply for the hostname to take effect.
    2. Select Ethernet enp3s0f3 and click Configure to set up the external interface.
      1. From the General section, check Automatically connect to this network when it’s available.
      2. Configure the external interface as necessary. Save and Exit.
    3. Select the toggle to ON for the interface.
    4. Select the toggle to ON for the interface.
  5. Select Software Selection. In the box labeled Base Environment on the left side, select Infrastructure server.
    1. Click Done to return home.
    2. Wait until the Begin Installation button is available, which may take several minutes. Then click it to continue.
  6. While waiting for the installation to finish, set the root password.
  7. Click Reboot when the installation is complete.
  8. Boot from the primary device and log in as root.

References

  1. The Many Ways to Define Artificial Intelligence
  2. Installing TensorFlow on Ubuntu* (Accessed 6/25/18)
  3. Intel® Optimization for TensorFlow* Installation Guide
  4. AI Data LifeCycle
  5. TensorFlow Download and Setup
  6. Horovod Distributed Training on Kubernetes using MLT
  7. Installing TensorFlow from Sources
  8. A Portable Foreign Function Interface Library (Libffi)
  9. Virtualenv
  10. Install TensorFlow on CentOS7" (Accessed 6/25/18)
  11. The CIFAR-10 dataset
  12. CIFAR-10 Details
  13. TensorFlow Models
  14. Epoch vs Batch Size vs Iterations
  15. Learning Multiple Layers of Features from Tiny Images (PDF), Alex Krizhevsky, 2009
  16. What is batch size in neural network?
  17. Performance Guide for TensorFlow
  18. Using Intel® Xeon® for Multi-node Scaling of TensorFlow* with Horovod*
  19. Cluster Design Reference Architecture
  20. PuTTY download page
  21. AI Data LifeCycle
  22. Conda Download Guide
  23. Install Docker and Learn Basic Container Manipulation in CentOS and RHEL 7/6 – Part 1
  24. Custom Docker daemon options
  25. TensorFlow Performance Guide - Optimizing for CPU
  26. Launch an AWS Deep Learning AMI
  27. Git Download Guide
  28. How to Install and Use Docker on Ubuntu 16.04
  29. Best Known Methods for Scaling Deep Learning with Tensorflow* On Intel® Xeon® Processor Based Clusters
  30. What Is the AWS Deep Learning AMI?
  31. Getting Started Training Prediction
  32. Specifying Machine Types or Scale Tiers
  33. Thomas W. Malone, MIT
  34. TensorFlow Samples
  35. Maximize TensorFlow* Performance on CPU: Considerations and Recommendations for Inference Workloads
Para obter informações mais completas sobre otimizações do compilador, consulte nosso aviso de otimização.