Intel® Advanced Vector Extensions

Interpreting Intel SDE avx/sse transition tracker

Hello, I am running Intel SDE in 'ast' mode (AVX/SSE Transition tracker.) on Mac OSX. I struggle to interpret the results.

First off, in the output, it shows function addresses, not function names. Should it not show the symbols? I built my app with -g.

Next, this is the output I see: are these numbers indicative of excessive transitions? Or are they in a normal range?

IIR Gaussian Blur Filter Implementation using Intel® Advanced Vector Extensions

This white paper proposes an implementation for the Infinite Impulse Response (IIR) Gaussian blur filter using Intel® Advanced Vector Extensions (Intel® AVX) instructions. For a 2048x2048 image size, the AVX implementation is ~2X faster than the SSE code.
  • Developers
  • C/C++
  • Intel® Advanced Vector Extensions
  • Intel® Streaming SIMD Extensions
  • visual computing
  • Gaussian blur filter
  • Graphics
  • Media Processing
  • Vectorization
  • _mm256_blend_epi16 doesn't work as documented

    The documentation for _mm256_blend_epi16 doesn't indicate that it operates on individual 128-bit channels, but this is the behavior I am seeing.  Is this the correct behavior?  Here is a reproducer code below showing the behavior for _mm256_blend_epi16 and _mm256_blend_epi32 where I attempt to insert a value into the first position of a vector using the blend instruction.

    Speedup with bulk/burst/coupled streaming write?

      Hello togehther,

    I've some very simple question. I hope, this is really simple.

    As I read and done already, bulk (coupled) streamin read/write should give some till significant speedup.

    After some more profiling, I've found one very small older method im our software that takes to much time in my opinion. The most time is spent to the last instruction - wtite data. For the future question - there is no guarantee by design, that destination memory fits in some cache and, more, the cache is not overwritten so far - so there are really some access penalties.

    PCIe Root Complex and the PCH

    Hello All,

    First of all, sorry this is not in the appropriate forum but I was directed to post this here.

    I have a question that's been bugging me regarding the PCIe Root Complex and the PCH and I'm hoping someone will be able to help clear things up a bit.

    I've always presumed that the PCIe Root Complex was a combination of the CPU and the PCH as they both contain PCIe Root Ports, thereby connecting PCIe devices to CPU/memory. 

    Early indicators of AVX512 performance on Skylake?

    Hi all,

    Looking ahead, what can we expect from the first generation of AVX512 on the desktop - or when should we expect an announcement?

    In the past:

    - The first generations of SSE CPUs didn't have a full-width engine, they broke 128-bit SSE operations in to two 64-bit uOps

    - The first AVX CPUs (Sandy Bridge / Ivy Bridge) needed two cycles for an AVX store - the L1 cache didn't have the bandwidth to perform a store in one cycle

    So what I'd like to know is:

    - Will the AVX512 desktop CPUs be able to handle a full-width L1 load and store per cycle?

    pmovzxbd using memory operands

    Is there a way to use pmovzxbd with a memory operand from intrinsics currently I have either

    _mm_cvtepu8_epi32(_mm_cvtsi32(ptr[offset])); //(movd)

    _mm_cvtepu8_epi32(_mm_insert_epi32(_mm_setzero_si128(),ptr[offset],0));  //(pinsrd)

    The movd or pinsrd should not be needed; in assembly I can write something like


    pmovzxbd xmm0,[rax+rdx*4]


    Is there a way I can make this call using intrinsics instead of assembly.

    Benefits of SSE/AVX processing when an integrated GPU is missing?

    Some Intel processors have an on-chip GPU (e.g. Intel Core i/-4770K using a HD Graphics 4600 GPU) whilst others don't have this (e.g  Intel Core i7 3930K). I'm wondering what implications this will have on SSE/AXV SIMD processing when such an integrated GPU is missing on the CPU. Even though there is support for SSE/AVX on many processor not having the embedded GPU, I wonder if this will reduce the benefit of using SSE/AVX significantly compared to CPUs with an embedded GPU? 

    Subscribe to Intel® Advanced Vector Extensions