Handling user-defined function-calls inside vector-loopsIf you want to vectorize a loop that has a user-defined function call, (possibly re-factor the code and) make the function-call a vector-elemental function.
Selective Use of gatherhint/scatterhint Instructions
This note documents a known hardware issue with early alpha hardware of the Intel® Xeon® Phi™ coprocessor (A0 stepping from 2011) and an undocumented option to work around it.
Scheduling for 1-4 Threads Per Core Using Compiler Option
This documents a compiler option that affects the number of hardware threads per core that will be used by an application.
-mCG_lrb_num_threads=1|2|3|4 (default is 2) ( Composer XE 2013 initial release, version 13.0.0.079. undocumented/unsupported option )
Vectorization Essentials, Random Number Function Vectorization
The Intel 13.0 Product Compiler now supports random number auto- vectorization of the drand48 family of random number functions in C/C++ and RANF and Random_Number functions in Fortran. Vectorization is supported through the Intel Short Vector Math Library (SVML).
Vectorization Essentials, Utilizing Full Vectors and Use of Option -opt-assume-safe-padding
Efficient vectorization involves making full use of the vector-hardware. This implies that users should strive to get most code to be executed in the kernel-vector loop as opposed to peel-loop and/or remainder-loop.
Vectorization Essentials, Outer Loop Vectorization via Intel® Cilk™ Plus Array Notations