Developer Guide


Task Parallelism

While the compiler achieves concurrency by scheduling independent individual operations to execute simultaneously, it does not achieve concurrency at coarser granularities (for example, across loops).
For larger code structures to execute in parallel with each other, you must write them as separate kernels that launch simultaneously. These kernels then run asynchronously with respect to each other and you can achieve synchronization and communication using pipes, as illustrated in the following figure:
Multiple Kernels Running Asynchronously
Multiple Kernels Running Asynchronously
This is similar to how a program running on a CPU can leverage threads running on separate cores to achieve simultaneous asynchronous behavior.

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserverd for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804