What to do when Nested Parallelism Runs Amuck? Getting Started with Python* module for Threading Building Blocks (Intel® TBB) in Less than 30 Minutes!

By Nathan G Greeneltch, Anton Malakhov

Published:12/27/2017   Last Updated:12/27/2017

Introduction and Description of Product

Intel® Threading Building Blocks (Intel® TBB) is a portable, open-source parallel programming library from the parallelism experts at Intel. A Python module for Intel® TBB is included in the Intel® Distribution for Python and provides an out-of-the-box scheduling replacement to address common problems arising from nested parallelism. It handles coordination of both intra- and inter-process concurrency. This article will show you how to launch Python programs using the Python module for Intel® TBB to parallelize math from popular Python modules like NumPy* and SciPy* by way of Intel® Math Kernel Library (Intel® MKL) thread scheduling. Please note that Intel® MKL also comes bundled free with the Intel® Distribution for Python. Intel® TBB is the native threading library for Intel® Data Analytics Acceleration Library (Intel® DAAL), which is a high-performance analytics package with a fully functional Python API. Furthermore, If working with the full Intel® Distribution for Python package, it is also the native threading underneath Numba*, OpenCV*, and select Scikit-learn* algorithms (which have been accelerated with Intel® DAAL).

 

How to Get Intel® TBB

To install full Intel® Distribution for Python package, which includes Intel® TBB, click below for installation guides:

Anaconda* Package
YUM Repository
APT Repository
Docker* Images

To install from Anaconda cloud:

conda install –c intel tbb

(It will change to ‘tbb4py’ in Q1 of 2018. Article will be updated accordingly)

 

Drop-in Use with Interpreter Call (no other code changes)

Simply drop in Intel® TBB and determine if it is the right solution for your problem statement! 

Performance degradation due to over-subscription can be caused by nested parallel calls, many times unbeknownst to the user. These sort of “mistakes” are easy to make in a scripting environment. Intel® TBB can be turned on easily for out-of-the-box thread scheduling with no code changes. In the faith of the scripting culture of the Python community, this allows for quick checking of Intel® TBB’s performance recovery. If you already have math code written, you can easily launch with the “-m tbb ” interpreter flag, followed by script name and any required args for your script. It’s as easy as this:

python -m tbb script.py args*

NOTE: See the Interpreter Flag Reference Section for full list of available flags.

 

Interpreter Flag Reference

Command Line Usage
python -m tbb [-h] [--ipc] [-a] [--allocator-huge-pages] [-p P] [-b] [-v] [-m] script.py args*
Get Help from Command Line
python -m tbb –-help
pydoc tbb
List of the currently available interpreter flags
Interpreter Flag Description of Instruction

-h,

--help

show this help message and exit

-m 

Executes following as a module (default: False)

-a,

--allocator

Enable TBB scalable allocator as a replacement for standard memory allocator (default: False)

--allocator-huge-pages

Enable huge pages for TBB allocator (implies: -a) (default: False)

-p P,

--max-num-threads P

Initialize TBB with P max number of threads per process (default: number of available logical processors on system)

-b,

--benchmark

Block TBB initialization until all the threads are created before continue the script. This is necessary for performance benchmarks that want to exclude TBB initialization from the measurements (default: False)

-v,

--verbose

Request verbose and version information (default: False)

--ipc

Enable inter-process (IPC) coordination between TBB schedulers (default: False)

 

Additional Links

Intel Product Page

Short Introduction Video

SciPy 2017 proceedings

SciPY 2016 Video Presentation

DASK* with Intel® TBB Blog Post

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804