Tips and techniques on using the Intel® Compilers to maximize your application performance.
C Standard Conformance
Intel® Compiler - How can I generate optimized code to run on any IA-32 or Intel®64 architecture processor?Some frequently used optimization switches of the Intel Compiler are described
Performance Tools for Software Developers - SSE generation and processor-specific optimizations continuedCan I combine the processor values and target more than one processor? How to generate optimized code for both Intel and AMD* architecture? Where can I find more information on processor-specific optimizations?
Loop blocking is a combination of strip mining and loop interchange to enhance reuse of local data. It helps the nested loops that manipulate arrays and are too large to fit into the cache. The loop blocking allows reuse of the arrays by transforming the
The article describes effect of /Qpar-threshold option when doing auto parallelization with Intel C++ compiler.
The Intel® Compiler treats the size of a "long" integer as 4 bytes or 8 bytes depending on the architecture and operating system, affecting portability. The size of a "long long" integer is always 8 bytes. The size of a "long double" may also vary.
In very large, complex functions, loops preceded by OpenMP directives may not be threaded. Compiler may emit the warning remark: "An internal threshold was exceeded: loops may not be vectorized or parallelized. Try to reduce routine size."
In the definition of an object-like macro, C99 and C++0x require that the replacement text be separated from the macro name by white space. The Intel® Compiler will check this requirement and will issue appropriate warning accordingly.