Intel® AVX C/C++ Intrinsics Emulation

Intel® AVX instruction set extension [1] will appear in the next generation Intel microarchitecture codename ‘Sandy Bridge'. We chose to announce AVX early to get as much support from software vendors as possible by the hardware launch time. Now, most software development platforms are supporting Intel AVX, examples are compilers and assemblers from Intel, Microsoft and GCC as well as UNIX binutils.

For early adopters we introduced support of AVX in Intel® Software Development Emulator [2], it allows you to run and check functional correctness of the code with the actual AVX instructions before hardware is available.

Today we are adding another useful piece to help those who may not be able to use new tools supporting AVX in their current development environment but plan to migrate in the future or are using a software platform which is not supported by Intel SDE. These software developers can still start programming with Intel AVX using intrinsics.

Here we are providing the C and C++ header file which emulates Intel AVX intrinsics. The AVX emulation header file uses intrinsics for the prior Intel instruction set extensions up to Intel SSE4.2. SSE4.2 support in your development environment as well as hardware is required in order to use the AVX emulation header file.

To use simply have this file included:

#include "avxintrin_emu.h"

Instead of usual:

#include <immintrin.h>

One can also create alternative immintrin.h file (which in turn includes avxintrin_emu.h) to avoid an intrusive change to the source base and then simply switch between real AVX code generation and emulation via alternating the path to include directories.

Emulation header is primarily targeting UNIX type of environments, and was tested on such with GCC and Intel C/C++ compilers. We have a strong support with other tools (compilers, assemblers and SDE) on Microsoft Windows platform, but this header file can still be used on Windows, if desired, with Intel Compiler.

Note that the AVX emulation header file is designed to allow functional correctness of an AVX implementation and not recommended for long-term usage or release in a final product. Once your development environment and hardware supports AVX, we recommend that you switch to the real AVX intrinsic header file.

Although we did our best to debug it, this file must not be considered a reference functional implementation of AVX instructions or even bug-free. Please see the current version's limitations and caveats in the beginning of the file. Please let us know about the issues you faced using it.


#include "avxintrin_emu.h"  // #include <immintrin.h>

void saxpy(float a, const float* x, const float* y, float* __restrict z, size_t len)
    size_t i = 0;
    __m256 a_ = _mm256_set1_ps(a);

    for (int len16_ = len & -16; i + 16 <= len16_; i += 16)
        __m256 x1_ = _mm256_loadu_ps(x + i);
        __m256 x2_ = _mm256_loadu_ps(x + i + 8);

        __m256 y1_ = _mm256_loadu_ps(y + i);
        __m256 y2_ = _mm256_loadu_ps(y + i + 8);

        x1_ = _mm256_mul_ps(x1_, a_);
        x2_ = _mm256_mul_ps(x2_, a_);

        x1_ = _mm256_add_ps(x1_, y1_);
        x2_ = _mm256_add_ps(x2_, y2_);

        _mm256_storeu_ps(z + i, x1_);
        _mm256_storeu_ps(z + i + 8, x2_);

    for (; i < len; ++i)
        z[i] = x[i] * a + y[i];

[1] Intel AVX - /en-us/avx/

[2] Intel Software Development Emulator - /en-us/articles/intel-software-development-emulator

Per informazioni più dettagliate sulle ottimizzazioni basate su compilatore, vedere il nostro Avviso sull'ottimizzazione.