Intel® Advanced Vector Extensions

ippGetCpuFeatures for AVX2 support

I'm relying at the moment on inline ASM to check for AVX2 support, but use the IPP function ippGetCpuFeatures to check for AVX and SSEx features.

Using the IPP function is arguably a better solution (simple & clean) than inline ASM, so I have a comment in my code for the AVX2 checks along the line of "use the IPP stuff instead when available"

I'm doing some cleanup these days and I remarked a series of new flags in ippcore.h, but it looks like several of these new flags aren't explained in the latest IPP documentation.

There are something wrong with using svml in inline ASM

     I try using __svml_sin2 in inline ASM like the way compiler does.  A code snippet as following,

     "vmovupd (%1), %%ymm0\n\t"
     "call __svml_sin4\n\t"
     "vmovupd %%ymm0, (%0)\n\t"
     "sub $1, %%rax\n\t"
     "jnz 3b\n\t"

    The program can build. But, the running output values are wrong.

AVX Power consumption (on i5)

Dear all,

Is there any data on how much more power is consumed when using the AVX, specifically on an i5 ? Where can I get some data on the i5 power consumption of power at peak floating point processing without the use of AVX, and the use of AVX.


I would expect it to look like something in the order of 55w without AVX, 60w with AVX. This is a total assumption only and I would appreciate anyone with some quantitative opinions to list here.


How Intel® AVX Improves Performance on Server Application

The latest Intel® Xeon® processor E7 v2 family includes a feature called Intel® Advanced Vector Extensions (Intel® AVX), which can potentially improve application performance. Here we will explain the context, and provide an example of how using Intel® AVX improved performance for a commonly known benchmark.

For existing vectorized code that uses floating point operations, you can gain a potential performance boost when running on newer platforms such as the Intel® Xeon® processor E7 v2 family, by doing one of the following:

The Chronicles of Phi - part 1 The Hyper-Thread Phalanx

The term phalanx is derived from a military formation used by the ancient Greeks and Romans. The formation generally involved soldiers lining up shoulder to shoulder, shield to shield multiple rows deep. The formation would advance in unison becoming “an irresistible force.” I use the term Hyper-Thread Phalanx to refer to the Hyper-Thread siblings of a core being aligned shoulder-to-shoulder and advancing forward.

Instruction set extensions programming reference, revision 18

In early February, an updated instruction set extensions programming reference, revision 18, has been posted here.

It includes information about:

  • Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions
  • Intel® Secure Hash Algorithm (Intel® SHA) extensions
  • Intel® Memory Protection Extensions (Intel® MPX)

For more information about the technologies:

Updated Intel® Software Development Emulator

Hello, we just released version 6.20 of the Intel® Software Development Emulator. It is available here:

It includes:

  • Added support for XSAVEC and CLFLUSHOPT.
  • Disabled TSX CPUID bits when TSX emulation is not requested.
  • Improved disassembly for MPX instructions.
  • Added an option for running chip-check only on the main executable.
  • Added support for -quark (Pentium ISA).
  • Added application debugging for Mac OSX with the lldb debugger.

Different ways to turn an AoS into an SoA


I'm trying to implement a permutation that turns an AoS (where the structure has 4 float) into a SoA, using SSE, AVX, AVX2 and KNC, and without using gather operations, to find out if it worth it.

For example, using KNC, I would like to use 4 zmm registers:

{A0, A1, ... A15}

{B0, B1, ... B15}

{C0, C1, ... C15}

{D0, D1, ... D15}

to end up having something like:

{A0, A4, A8, A12, B0, B4, B8, B12, C0, C4, C8, C12, D0, D4, D8, D12}

{A1, A5, A9, ...}

{A2, A6, A10, ...}

{A3, A7, A11, ...}

Suscribirse a Intel® Advanced Vector Extensions