Intel® Advanced Vector Extensions

Intel SDE and VS2013


I try using Intel® Software Development Emulator with Visual Studio 2013 but I have troubles.

Try to start a debug session with SDE Debugger fails, saying my programme (of the visual studio project) could not be launched because of a missing component dll. CTRL + F5 will start the programm but it crashes soon.

Running SDE tool with my exes from a command is fine. But one of my shuffles seems wrong so I would like to inspect vector registers at a certain part. Any idea? Or is VS2013 not supported?

Haswell and crosslan


I build a code for Integralimage computation with SSE and its quite good. But I have serious problems making use of AVX/AVX2. I run my code on an i5-4460.

What is the basis: For integral image I need rowsum which is not optimal for vector units but can be done by shuffle and add. And I need to broadcast the last element to all elements as a second step. This can be done with a shuffle.

Now with AVX, there is no full shuffle for 32 bit, but I can do it with a normal shuffle and _mm256_permute2f128_ps.

Q on memory comparison optimization

Hi All,

I am using AVX/SSE instructions to replace memcmp and our workload includes comparing 64 bytes and occasionally 64 and 128 bytes. I am using following function cmp32 for 32byte comparisons and extend it 2 times for 64 or 4 times for 128 bytes and I am hardly getting 1% performance improvement. Testing was done on Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, Ubuntu 14.04 x86_64.

I tried replacing following lines
vcmp = _mm256_cmpeq_epi64(xmm0, xmm1);
vmask = _mm256_movemask_epi8(vcmp);

Use which hardware PMU events to calculate FLOPS on Intel(R) Xeon Phi(TM) coprocessor?

FLOPS means total floating point operations per second, which is used in High Performance Computing. In general, Intel(R) VTune(TM) Amplifier XE
only provides metric named Cycles Per Instruction (average CPI), that is to measure performance for general programs.

