The performance benefit from recompiling an application for Intel® Advanced Vector Extensions (Intel® AVX, Intel® AVX2 and/or Intel® AVX-512) may vary greatly from one application to another.
Applications containing floating-point loops that can already be vectorized using Intel® SSE instructions are likely to see significant gains just by recompiling for Intel AVX due to the greater width of the SIMD floating-point Intel AVX instructions.
Applications that call performance libraries such as the Intel® Math Kernel Library, that contain many functions optimized for Intel AVX, may see gains even without rebuilding.
The benefits of recompilation are likely to be significantly less for applications containing mostly scalar code, integer code, with very heavy access to memory, or heavy use of double precision divide and square root operations.
The same is true for applications with hot loops or kernels that do not vectorize; however, the Intel AVX instruction set contains some new features that help to vectorize certain loops that were difficult to vectorize using SSE instructions. The latest Intel® Compilers also contain new features that allow more loops to be vectorized. See Requirements for Vectorizable Loops for an indication of what sort of loops can be vectorized.
Note that the recent Intel® Scalable processor and other processors with support for the Intel AVX family of instructions embody a variety of other features that may improve application performance. This note addresses primarily the impact of the 256 bit wide and 512 bit wide SIMD registers and floating-point arithmetic instructions, compared to 128 bit wide Intel® SSE instructions.