Parallelize loops with Intel® Threading Building Blocks using Intel® C++ Compiler for lambda expressions.
If printf or fprintf functions cause transaction aborts, use Intel® Processor Trace as a work-around.
In this tutorial, we demonstrate some possible ways to optimize an application to run on the Intel® Xeon Phi™ processor
Find out how to use the command-line interface in Intel® Advisor 2017 for a quick, initial analysis of loop performance that gives an overview of the hotspots in your code.
This tutorial shows how to install Offload over Fabric (OoF) software on 2nd generation Intel® Xeon Phi™ processor, configure the hardware, test the basic configuration, and enable OoF
Code Sample included: Learn how to use MPI-3 shared memory feature using the corresponding APIs on the Intel® Xeon Phi™ processor.
Learn techniques for vectorizing code, adding thread-level parallelism, and enabling memory optimization.
Matrix multiplication (MM) of two matrices is one of the most fundamental operations in linear algebra. The algorithm for MM is very simple, it could be easily implemented in any programming language. This paper shows that performance significantly improves when different optimization techniques are applied.
How to install and enable Offload Over Fabric, configure the hardware, and test the configuration.
This article focuses on the steps to improve software performance with vectorization. Included are examples of full applications along with some simpler cases to illustrate the steps to vectorization.