Intel® Streaming SIMD Extensions

Intel SDE and VS2013


I try using Intel® Software Development Emulator with Visual Studio 2013 but I have troubles.

Try to start a debug session with SDE Debugger fails, saying my programme (of the visual studio project) could not be launched because of a missing component dll. CTRL + F5 will start the programm but it crashes soon.

Running SDE tool with my exes from a command is fine. But one of my shuffles seems wrong so I would like to inspect vector registers at a certain part. Any idea? Or is VS2013 not supported?

Haswell and crosslan


I build a code for Integralimage computation with SSE and its quite good. But I have serious problems making use of AVX/AVX2. I run my code on an i5-4460.

What is the basis: For integral image I need rowsum which is not optimal for vector units but can be done by shuffle and add. And I need to broadcast the last element to all elements as a second step. This can be done with a shuffle.

Now with AVX, there is no full shuffle for 32 bit, but I can do it with a normal shuffle and _mm256_permute2f128_ps.

Q on memory comparison optimization

Hi All,

I am using AVX/SSE instructions to replace memcmp and our workload includes comparing 64 bytes and occasionally 64 and 128 bytes. I am using following function cmp32 for 32byte comparisons and extend it 2 times for 64 or 4 times for 128 bytes and I am hardly getting 1% performance improvement. Testing was done on Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, Ubuntu 14.04 x86_64.

I tried replacing following lines
vcmp = _mm256_cmpeq_epi64(xmm0, xmm1);
vmask = _mm256_movemask_epi8(vcmp);

Can AVX instruction be executed in parallel


Can two avx instrcutions can be executed in parallel?

For example,


            a1= _mm256_load_ps((Rin +offset)); 
            a2= _mm256_load_ps((Gin +offset));  
            a3= _mm256_load_ps((Bin +offset));

            ac0 = _mm256_mul_ps(a1, in2outAvx_11); 
            ac1 = _mm256_mul_ps(a2, in2outAvx_12);
            ac2 = _mm256_mul_ps(a3, in2outAvx_13);
            z0 = _mm256_add_ps(ac0,ac1);
            z1 = _mm256_add_ps(z0, ac2);

Subscribe to Intel® Streaming SIMD Extensions