A toolkit that gives 6 Steps to Increase Performance Through Vectorization in Your Application
This article discussions parallelization and provides links that will help you understand your programming environment and evaluate the suitability of your app.
Cache Blocking Techniques Overview
Memory Layout Transformations Overview
This article is part of the Intel® Modern Code Developer Community documentation which supports developers in leveraging application performance in code through a systematic step-by-step optimization framework methodology. This article addresses: Thread level parallelization.
Get an overview of parallelization using the Intel® MPI Library and links to additional documentation.
Optimization reports from the Intel® compilers guide the developer with optimization details
Vectorization Essentials: Vectorizing the outer loop can be profitable
Vectorization Essentials: Efficient vectorization involves making full use of the vector-hardware in the kernel-vector loop.
Random number function auto-vectorization supported