How to Compile for Intel® AVX

Intel® AVX (Intel® Advanced Vector Extensions) is a 256 bit instruction set extension to Intel® SSE (Intel® Streaming SIMD Extensions) that was first announced in 2008. Further information about Intel AVX is available at

The Intel® C/C++ and Fortran Compilers, versions 11.1 and 12.0, support the building of applications for Intel AVX. On Windows*, use the command line switch /QxAVX. On Linux*, use –xavx. The switches /QaxAVX (Windows) and –axavx (Linux) may be used to build applications that will take advantage of AVX instructions on Intel systems that support these, but will use only SSE instructions on other Intel or compatible, non-Intel systems. For example, to generate a specialized code path optimized for 2nd generation Intel® Core™ processors, and a default code path optimized for Intel processors or compatible, non-Intel processors that support at least the SSE3 instruction set, compile with /QaxAVX /arch:SSE3 (Windows) or with -axavx -msse3 (Linux).

Both C/C++ and Fortran compilers support automatic vectorization of floating-point loops using AVX instructions. The C/C++ compiler also supports AVX-based intrinsics (via the header file immintrin.h) and inline assembly. Intel AVX allows the vectorization of a wider variety of floating point loops than Intel SSE, with a greater potential performance gain due to the greater width of the SIMD registers. The vectorizer is enabled automatically by the switches listed above. To see which loops have been vectorized, use the switch /Qvec-report1 (windows) or –vec-report1 (Linux).

The 2nd generation Intel® Core™ processor family supports Intel AVX. In addition, the Intel® Software Development Emulator (Intel® SDE) is available for testing programs built for Intel AVX. See
Further general information about the Intel Compilers for C/C++ and Fortran is available at . Further information about compiler support for Intel AVX may be found in the Intel C++ Compiler User and Reference Guides, for example in the section 'Intrinsics for Advanced Vector Extensions', accessible online.

The attached presentation in PDF format contains additional detail on compiling for the Intel AVX instruction set and the Intel 2nd Generation Core processor family, including a section on loop vectorization.
For more complete information about compiler optimizations, see our Optimization Notice.
PDF icon compiling-for-avx-kb.pdf1.33 MB


Martyn Corden (Intel)'s picture

Yes, that's possible, it corresponds to unrolling the loop by an extra factor of two. The compiler may sometimes do that automatically.

Please post any further questions to the Intel Compiler user forums under Intel Software Development Products at

Chaitali C.'s picture

I have one question w.r.t Intel AVX. As per the online information 8 floats can be processed in single iteration using YMM registers. My question is can we use all YMM registers in parallel in single iteration by making code changes in assembly? Ex. if YMM0 is holding 8 floats(0-7) of an input array 1 and YMM1 is holding 8 floats (0-15) of an input array 2 and YMM3 is holding result in case of vector addition, then can in same loop iteration YMM4 hold next 8 floats of input array1(8-15),YMM5 hold 8 floats of input array2(8-15) and YMM6 can hold output array(8-15).

Thanks in advance.

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.