What to do when Nested Parallelism Runs Amuck? Getting Started with Python* module for Threading Building Blocks (Intel® TBB) in Less than 30 Minutes!

Introduction and Description of Product

Intel® Threading Building Blocks (Intel® TBB) is a portable, open-source parallel programming library from the parallelism experts at INTEL®. A Python module for Intel® TBB is included in the Intel® Distribution for Python and provides an out-of-the-box scheduling replacement to address common problems arising from nested parallelism. It handles coordination of both intra- and inter-process concurrency. This article will show you how to launch Python programs using the Python module for Intel® TBB to parallelize math from popular Python modules like NumPy* and SciPy* by way of Intel® Math Kernel Library (Intel® MKL) thread scheduling. Please note that Intel® MKL also comes bundled free with the Intel® Distribution for Python. Intel® TBB is the native threading library for Intel® Data Analytics Acceleration Library (Intel® DAAL), which is a high-performance analytics package with a fully functional Python API. Furthermore, If working with the full Intel® Distribution for Python package, it is also the native threading underneath Numba*, OpenCV*, and select Scikit-learn* algorithms (which have been accelerated with Intel® DAAL).

 

How to Get Intel® TBB

To install full Intel® Distribution for Python package, which includes Intel® TBB, click below for installation guides:

Anaconda* Package
YUM Repository
APT Repository
Docker* Images

To install from Anaconda cloud:

conda install –c intel tbb

(It will change to ‘tbb4py’ in Q1 of 2018. Article will be updated accordingly)

 

Drop-in Use with Interpreter Call (no other code changes)

Simply drop in Intel® TBB and determine if it is the right solution for your problem statement! 

Performance degradation due to over-subscription can be caused by nested parallel calls, many times unbeknownst to the user. These sort of “mistakes” are easy to make in a scripting environment. Intel® TBB can be turned on easily for out-of-the-box thread scheduling with no code changes. In the faith of the scripting culture of the Python community, this allows for quick checking of Intel® TBB’s performance recovery. If you already have math code written, you can easily launch with the “-m tbb ” interpreter flag, followed by script name and any required args for your script. It’s as easy as this:

python -m tbb script.py args*

NOTE: See the Interpreter Flag Reference Section for full list of available flags.

 

Interpreter Flag Reference

Command Line Usage
python -m tbb [-h] [--ipc] [-a] [--allocator-huge-pages] [-p P] [-b] [-v] [-m] script.py args*
Get Help from Command Line
python -m tbb –-help
pydoc tbb
List of the currently available interpreter flags
Interpreter FlagDescription of Instruction

-h,

--help

show this help message and exit

-m 

Executes following as a module (default: False)

-a,

--allocator

Enable TBB scalable allocator as a replacement for standard memory allocator (default: False)

--allocator-huge-pages

Enable huge pages for TBB allocator (implies: -a) (default: False)

-p P,

--max-num-threads P

Initialize TBB with P max number of threads per process (default: number of available logical processors on system)

-b,

--benchmark

Block TBB initialization until all the threads are created before continue the script. This is necessary for performance benchmarks that want to exclude TBB initialization from the measurements (default: False)

-v,

--verbose

Request verbose and version information (default: False)

--ipc

Enable inter-process (IPC) coordination between TBB schedulers (default: False)

 

Additional Links

Intel Product Page

Short Introduction Video

SciPy 2017 proceedings

SciPY 2016 Video Presentation

DASK* with Intel® TBB Blog Post

For more complete information about compiler optimizations, see our Optimization Notice.