Intel® Streaming SIMD Extensions

Intel® graphics virtualization update

Traditional business models, built on graphics and visualization usages such as workstation remoting, VDI, DaaS, transcoding, media streaming, and on-line gaming, are beginning to draw open source attention, worldwide. Employees are becoming mobile. They want flexibility of working from any device, anywhere, anytime, with any data, without any compromise in the quality due to access, latency or visualization.

Bug in SDE emulation of AVX-512 _mm512_permutevar_ps() ?


I have an issue with SDE emulating _mm512_permutevar_ps() [aka VPERMPS] in an unexpected way. I understand from the documentation that it should behave as the 512 bit variants of _mm256_permutevar8x32_ps(), and be able to do cross-lane shuffling. So the attached file should reverse the content of the vector. It works with _mm256_permutevar8x32_ps(), but _mm512_permutevar_ps() clearly doesn't produce the expected results, but rather an intra-lane shuffling:

asm blocks


I am writing AVX code inside asm blocks (don"t want to use avx intrinsics).

A lot of gp registers are used and so they are mixed with the ones generated by the compiler and thus it is screwing the behavior of the code pretty fast.

Is there an automatic or manual way to avoid these register overlaps ?

Any link to documentation would be great.

I would like also to use asm blocks in fortan with ifort, but didn't find the way yet.



SDE produces unstable behavior


I have some SSE/AVX code that I'm trying to test with Intel Software Development Emulator (SDE) on CPUs without the native support for some of the instruction set extensions. In particular, I tried the following setups:

1. Sandy Bridge CPU, SDE is running with -hsw switch.

2. Sandy Bridge CPU, SDE is running with -hsw -sse-sde switches.

3. A KVM guest virtual machine with SSE4 instructions (host CPU is Nehalem), SDE is running with -hsw switch.

All this is on Linux x86_64, SDE 6.22 and 6.12.

Disabling AVX

Hi all,

Is there a way (under Windows 7) to disable the support of AVX.
I wan't to make sure that on a pre-SB machine I don't get "Illegal Instruction exception".
Currently I have to use another machine and it's a bit annoying.

I don't use the /QaXXX flags because the code is already taking very long to compile (so it's taken care of manually) and I want it to work on Microsoft compiler too (even if performances would be degraded of course ;) ).

BKMs on the use of the SIMD directive

We had an ask from one of the various “Birds of a Feather” meetings Intel® holds at venues such as at the Super Computing* (SC) and International Super Computing* (ISC) conferences. The customer wanted to know BKMs (Best Known Methods) on the proper usage of the new OpenMP* 4.0 / Intel® Cilk™ Plus SIMD directive. I volunteered to create such a list. Investigating the topic more thoroughly, I discovered that there is already a vast amount of resources on vectorization and the use of the SIMD directive.

FMA Support

Hello guys, sorry for a basic question. I've been looking for architectures which supports FMA. I know Sandy Bridge doesn't support, and Haswel supports it. But, what about Ivy Bridge? Does Ivy Bridge supports FMA?

Best regards.

ippGetCpuFeatures for AVX2 support

I'm relying at the moment on inline ASM to check for AVX2 support, but use the IPP function ippGetCpuFeatures to check for AVX and SSEx features.

Using the IPP function is arguably a better solution (simple & clean) than inline ASM, so I have a comment in my code for the AVX2 checks along the line of "use the IPP stuff instead when available"

I'm doing some cleanup these days and I remarked a series of new flags in ippcore.h, but it looks like several of these new flags aren't explained in the latest IPP documentation.

There are something wrong with using svml in inline ASM

     I try using __svml_sin2 in inline ASM like the way compiler does.  A code snippet as following,

     "vmovupd (%1), %%ymm0\n\t"
     "call __svml_sin4\n\t"
     "vmovupd %%ymm0, (%0)\n\t"
     "sub $1, %%rax\n\t"
     "jnz 3b\n\t"

    The program can build. But, the running output values are wrong.

Iscriversi a Intel® Streaming SIMD Extensions