Compare Windows* threads, OpenMP*, Intel® Threading Building Blocks for parallel programming

This is an interesting topic when we plan to implement parallel programs on multi-core system to best utilize processors. That means we want to divide (serial) big task into small tasks and let them running simultaneously.

Next question is what method will be used? There are three options – 1) Traditional Windows* Threads 2) OpenMP* 3) Intel® Threading Build Blocks (call it TBB below). This is hard to say what is the best and what is the worst, it depends developer’s situation. For example, if the developer doesn't have parallel programming experience (skill) before, so OpenMP* and TBB could be used when the developer hates to learn Windows* threads. The advantages of OpenMP* are that code is clean and easier to maintenance. TBB is helpful, that the developer doesn’t need to understand how threads work, just submit your tasks to TBB, trust TBB to run your application with better performance. Some developers want to control threads running by themselves, Windows* threads is an option.

Here I list major factors of three options (for your consideration)
 

 

Challenges for parallel programming

 

Windows* threads

 

OpenMP*

 

Intel® Threading Build Blocks

 

Task level

 

 

 

x

 

x

 

Cross-platform support

 

 

 

x

 

x

 

Scalable runtime libraries

 

 

 

 

 

x

 

Threads’ Control

 

x

 

 

 

 

 

Pre-tested and validated

 

 

 

x

 

x

 

C Development support

 

x

 

x

 

 

 

Intel® Threading Tools support

 

x

 

x

 

x

 

Maintenance for tomorrow

 

 

 

x

 

x

 

Scalable memory allocator

 

 

 

 

 

x

 

“light” mutex

 

 

 

 

 

x

 

Processor affinity

 

x

 

 

 

Thread affinity

Thinking that you might be in one of below situations, please do different thing to save development cost.

Case-1

You already have workable multithreaded program, and hope to find performance bottleneck then improve it.

You don’t need to re-write the code, just use Intel® VTune™ Performance Analyzer and Intel? Thread Profiler to find essential performance problem in code, so you have opportunity to use OpenMP* or TBB to improve code in “deep” loop, or change mechanism on sync-objects

Case-2

You may have serial code, but don’t know how to change as multithreaded code.

You use Intel® VTune™ Performance Analyzer to find hotspots functions in your code, don’t need to change whole program to parallel – just change critical code to parallel.

Case-3

You have a new project to be developed. Consider your algorithm as parallel work, divide to small tasks, proper granularity. If you are not good at multithreaded programming – just use TBB to submit small tasks, or use OpenMP* to deal with structured stream 

For more complete information about compiler optimizations, see our Optimization Notice.

10 comments

Top
anonymous's picture

Another important feature is the license. Win32 Threads can be freely used in commercial applications without any restrictions. TBB is licenced under the (L)GPL.

Although TBB uses some run-time exception on top of the GPL, many proprietary development projects will stay away from it for legal reasons. Didn't you consider a BSD or MIT style license?

Peter Wang (Intel)'s picture

I posted a simple example - http://software.intel.com/en-us/blogs/2009/01/22/an-example-to-show-you-performance-data-for-different-implementations-of-pi-calculating

anonymous's picture

An additional option would be to try Intel Concurrent Collections for C/C++ technology.

An updated version will be available on the whatif site by the end of the year here:

http://software.intel.com/en-us/articles/intel-concurrent-collections-for-cc

Here is a summary of what Intel Concurrent Collections is all about:

Intel® Concurrent Collections for C/C++ provides a mechanism for constructing a C++ program that will execute in parallel while allowing the application developer to ignore issues of parallelism such as low-level threading constructs or the scheduling and distribution of computations. The model allows the programmer to specify high-level computational steps including inputs and outputs without imposing unnecessary ordering on their execution. Code within the computational steps is written using standard serial constructs of the C++ language. Data is either local to a computational step or it is explicitly produced and consumed by them. An application in this programming model supports multiple styles of parallelism (e.g., data, task, pipeline parallel). While the interface between the computational steps and the runtime system remains unchanged, a wide range of runtime systems may target different architectures (e.g., shared memory, distributed) or support different scheduling methodologies (e.g., static or dynamic). Here we provide a runtime system for shared memory systems that supports parallel execution although it is not yet highly optimized. Our goal in supporting a strict separation of concerns between the specification of the application and the optimization of its execution on a specific architecture is to help ease the transition to parallel architectures for programmers who are not parallelism experts.

Dmitry Vyukov's picture

Re: So that we say Win Thread APIs are "assembly" language, and OpenMP* and TBB are "C/C++" language:-)

One just have to watch out so that OpenMP and TBB will not become "VisualBasic" for tomorrow's distributed and heterogeneous many-core systems :)

It's just a joke, actually I like TBB's design, and I know that very smart people indeed are working on it (TBB).

Dmitry Vyukov's picture

Re: OpenMP* libraries are validated by Compiler provider, as well as TBB libraries.
Didn't WinThreads extremely validated by Microsoft?
Also validation of OpenMP and TBB also validates WinThreads, because they are based on WinThreads, so WinThreads are three times validated :)

Dmitry Vyukov's picture

I am not a fan of Win Threads, and I understand that Intel's objective is to push OpenMP and TBB (WinThreads - 4 stars, OpenMP - 6 stars, and TBB - 9 starts) :) I just want to say that Win Threads are rather flexible and have some very interesting and differentiating aspects too. For example, will TBB run my application with better performance on NUMA system?

Peter Wang (Intel)'s picture

OpenMP* libraries are validated by Compiler provider, as well as TBB libraries.

Windows* Threads APIs are "thread concept" based, but OpenMP* and TBB are "task concept" - hidden "thread concept". So that we say Win Thread APIs are "assembly" language, and OpenMP* and TBB are "C/C++" language:-)

Dmitry Vyukov's picture

Win Threads also support NUMA (memory allocation, system's structure discovery, thread placement), which can be crucial for some environments.
I think one can also mention atomic/interlocked API. TBB has the best support (finest grained), then Win Threads, and then OpenMP. Also Win Threads contains some rather interesting functions like GetCurrentProcessorNumber(), FlushProcessWriteBuffers(), GetLogicalProcessorInformation(), etc.

Dmitry Vyukov's picture

Win Threads does support "Task level" via rather developed thread pool API (finer-grained than threads, automatic load-balancing, automatic thread management, etc), however that tasks probably differ from those tasks.

Dmitry Vyukov's picture

Hmmm... Win Threads are not pretested and validated...
Then how TBB and OpenMP can be pretested and validated, if they are built on top of Win Threads on Windows platform?..

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.