There are two popular approaches for adding parallelism to programs. You can use either:
A high-level parallel framework like Intel® Threading Building Blocks (Intel® TBB), OpenMP*, Intel® Cilk™ Plus. Of these parallel frameworks for native code, Intel TBB supports C++ programs, Intel Cilk Plus supports C or C++ programs, and OpenMP supports C, C++, or Fortran programs. For managed code on Windows* OS such as C#, use the Microsoft Task Parallel Library* (TPL).
A low-level threading API like Windows* threads or POSIX* threads. In this case, you directly create and control threads at a low level. These implementations may not be as portable as high-level frameworks.
There are several reasons that Intel recommends using a high-level parallel framework:
Simplicity: You do not have to code all the detailed operations required by the threading APIs. For example, the Intel Cilk Plus _Cilk_for (or cilk_for) , OpenMP* #pragma omp parallel for (or Fortran
!$OMP PARALLEL DO), and the Intel TBB parallel_for() are all designed to make it easy to parallelize a loop (see Reinders Ch. 3). With frameworks, you reason about tasks and the work to be done; with threads, you also need to decide how each thread will do its work.
Scalability: The frameworks select the best number of threads to use for the available cores, and efficiently assign the tasks to the threads. This makes use of all the cores available on the current system.
Loop Scalability: Intel TBB, OpenMP, and Intel Cilk Plus assign contiguous chunks of loop iterations to existing threads, amortizing the threading overhead across multiple iterations (see Intel TBB grain size: Reinders Ch. 3).
Automatic Load Balancing: Intel TBB, OpenMP, and Intel Cilk Plus have features for automatically adjusting the grain size to spread work amongst the cores. For example, the Intel Cilk Plus run-time efficiently divides the iterations of a parallel loop among the available cores, and adjusts the grain size to spread the work as evenly as possible among the cores. In addition, when the loop iterations or parallel tasks do uneven amounts of work, the Intel TBB and Intel Cilk Plus schedulers will dynamically reschedule the work to avoid idle cores.
To implement parallelism, you can use any parallel framework you are familiar with, as described in Using Other Parallel Frameworks.
The high-level parallel frameworks available for each programming language include: