Apply the concepts of parallelism and distributed memory computing to your code to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
This paper is a more formal response to an Intel® Developer Zone forum posting. See: (https://software.intel.com/en-us/forums/intel-moderncode-for-parallel-architectures/topic/590710).
Parallelize loops with Intel® Threading Building Blocks using Intel® C++ Compiler for lambda expressions.
Get a background on vectorization and learn different techniques to evaluate its effectiveness.
Contrast results for manually tuning financial data and using data layout templates in the Intel® C++ Compiler.
Learn how to use Offload over Fabric software for a server migration path.
Improve your vectorization project using techniques and methodologies from Intel.
Learn techniques for vectorizing code, adding thread-level parallelism, and enabling memory optimization.
If printf or fprintf functions cause transaction aborts, use Intel® Processor Trace as a work-around.