As part of my focus on software performance, I also support and consult on implementing scalable parallelism in applications. There are many reasons to implement parallelism as well as many methods for doing it - but this blog is not about either of those things. This blog is about the performance advantages of one particular way of implementing parallelism - and, luckily, that way is supported by several models available.
I am talking about task-based parallelism, which means that you design your algorithms in terms of "tasks" (work to be done) instead of the specifics of threads and CPU cores. There are several parallelism models currently available that use tasks: Intel® Threading Building Blocks (TBB), Intel® Cilk Plus, Microsoft* Parallel Patterns Library, and OpenMP 3.0, to name a few. With a task-based library (like the ones mentioned) you can use pre-coded functions, pragmas, keywords, or templates to add parallelism to your application. Besides potentially reducing your development effort, there are many performance-related reasons why this is a good thing! Below, I give 4 of the advantages.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804