Getting Started with Intel® Optimization for PyTorch*

By Nathan G Greeneltch, Jing Xu, and Shailendrsingh Kishore Sobhee

Published:03/26/2019   Last Updated:03/26/2019

In collaboration with Facebook, PyTorch* is now directly combined with many Intel optimizations to provide superior performance on Intel architecture. The Intel® Optimization for PyTorch* provides the binary version of latest PyTorch release for CPUs, and further adds Intel extensions and bindings with oneAPI Collective Communications Library (oneCCL) for efficient distributed training.

The Intel extension, Intel® Extension for PyTorch (IPEX), to make the out-of-box user experience of PyTorch* CPU better while achieving good performance. The extension also will be the Pull-Request (PR) buffer for the Intel PyTorch framework dev team. The PR buffer will not only contain functions, but also optimizations (for example, take advantage of Intel's new hardware features). You can get more detailed info here.

To raise performance of distributed training, a PyTorch* module, torch-ccl, implements PyTorch* C10D ProcessGroup API for Intel® oneCCL (collective commnications library). Intel oneCCL is a library for efficient distributed deep learning training implementing such collectives like allreduce, allgather, alltoall. For more information on oneCCL, please refer to the oneCCL documentation. torch-ccl can be dynamically loaded as external ProcessGroup and only works on Linux platform for now. You can get more detailed info here.

See the article Intel and Facebook* collaborate to Boost PyTorch* CPU Performance for more details on recent performance accelerations.

Installation

  • Install via Intel® AI Analytics Toolkit

Intel® AI Analytics Toolkit includes the entire package of Intel® Optimization for PyTorch that includes binaries from latest PyTorch release, Intel Extensions for Pytorch (IPEX) and Torch-CCL together. There are multiple ways to get the toolkit and its components. It is distributed through several channels – Anaconda, Docker containers, Package managers (Yum, Apt, Zypper) and an online / offline installer from Intel. To download Intel Optimization for PyTorch from the AI Analytics Toolkit, visit here and choose the installation method of your choice. You can find more detailed information about the toolkit here.

  • Install via alternative methods for individual component

  • PyTorch

oneDNN has been integrated into official release of PyTorch by default, thus users can get performance benefit on Intel platform without additional installation steps.

Users can easily get PyTorch from its official website. As shown in the following screenshot, a stable version and a preview version are provided for Linux*, mac OS* and Windows*. Users can also choose to install the binary from anaconda*, pip, LibTorch or build from source. Python* 3.5 to 3.7 and C++ are supported. To run PyTorch on Intel platforms, the CUDA* option must be set to None.

Note: all versions of PyTorch (with or without CUDA support) have oneDNN acceleration support enabled by default.

  • IPEX

Currently utilizing IPEX requires to apply patches to PyTorch 1.5.0-rc3 source code, thus, you need to compile PyTorch and IPEX from source. Please follow the steps below to install IPEX.

1. Get source code of PyTorch 1.5.0-rc3 and IPEX, and apply patches

# a. Get PyTorch source code
$ git clone --recursive https://github.com/pytorch/pytorch
$ cd pytorch

# checkout source code to the specified version
$ git checkout v1.5.0-rc3

# update submodules for the specified PyTorch version
$ git submodule sync
$ git submodule update --init --recursive

# b. Get IPEX source code
$ git clone --recursive https://github.com/intel/intel-extension-for-pytorch
$ cd intel-extension-for-pytorch

# if you are updating an existing checkout
$ git submodule sync
$ git submodule update --init --recursive

# c. Apply git patch to pytorch code
$ cd ${pytorch_directory}
$ git apply ${intel_extension_for_pytorch_directory}/torch_patches/dpcpp-v1.5-rc3.patch

2. Compile and install PyTorch 1.5.0-rc3 with patches applied

$ cd ${pytorch_directory}
$ python setup.py install

3. Compile and install IPEX

# a. Install dependencies
$ pip install lark-parser hypothesis

# b. Install the extension
$ cd ${intel_extension_for_pytorch_directory}
$ python setup.py install
  • torch-ccl

You need Python 3.6 or later and a C++14 compiler to take advantage of torch-ccl.

# a. Install PyTorch following instructions of either out-of-stock PyTorch or PyTorch for IPEX

# b. get torch-ccl source code and compile it
$ git clone https://github.com/intel/torch-ccl.git && cd torch-ccl 
$ git submodule sync 
$ git submodule update --init --recursive 
$ python setup.py install

# c. oneCCL is used as third party repo of torch-ccl but you need to source the oneCCL environment before runing.
$ torch_ccl_path=$(python -c "import torch; import torch_ccl; import os;  print(os.path.abspath(os.path.dirname(torch_ccl.__file__)))")
$ source $torch_ccl_path/ccl/env/setvars.sh

Please check here to learn source code align info between PyTorch* and torch-ccl.

Sanity Check

You can check whether these components are installed or not via pip command.

pip list

 

Getting Started

We have open sourced sample codes for Intel® Optimization for PyTorch* on Github. Please check more detailed infomation here.

Performance Considerations

For performance consideration of PyTorch running on Intel® Architecture processors, please refer to Data Layout, Non-Uniform Memory Access (NUMA) Controls Affecting Performance and oneMKL-DNN Technical Performance Considerations sections of: Maximize TensorFlow* Performance on CPU: Considerations and Recommendations for Inference Workloads.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.