Intel® Streaming SIMD Extensions

small typo in Intel® 64 and IA-32 Architectures Software Developer’s Manual


It seems that there is a small typo in the Intel® 64 and IA-32 Architectures Software Developer’s Manual (Order Number: 253665-054US April 2015), page 3-149 (cmpss instruction) :

128-bit Legacy SSE version: The first source and destination operand (first operand) is an XMM register. The second source operand (second operand) can be an XMM register or 64-bit memory location.

It should be 32-bit memory location.




Guaranteed atomic operation clarification


I'm trying to understand a line in the Intel Architecture manual. It's a description of a memory operation that is guaranteed to be atomic.

The line is at Chapter 8, Section 8.1.1 "Guaranteed Atomic Operations", second bullet list, second item:
>16-bit accesses to uncached memory locations that fit within a 32-bit data bus

The way I interpret this (which must be wrong) is: Accesses to 16-bit regions of memory that are not currently cached and that fit within a data bus that transfers 32-bit values.

the issue about APIC drop msix interrupt

hello, I have a difficult problem,.scenes are as follows:

the hardware env is Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz, a Altera FPGA board. 

the os is Linux debian-rss 3.16.7-ckt7

FPGA create 32 DMA transfer to cpu, generate a interrupt per transfer.

This 32 interrput distribution to 8 diffirent msix IRQ.

According to APIC spec, each interrupt maybe one in ISR, one in IRR,the third maybe dropped.

But now i distribution 2 interrputs to each IRQ, why maybe dropped interrputs?

SIMD fácil mediante envolturas

SIMD operations are widely used for 3D graphics applications. This tutorial provides new insights into SIMD by comparing SIMD lanes and CPU threads, and steps you through the process of creating a simple, straightforward SIMD implementation in your own code.
  • Desarrolladores
  • Linux*
  • Microsoft Windows* 8.x
  • Windows*
  • C/C++
  • Avanzado
  • Intermedio
  • Intel® Streaming SIMD Extensions
  • AVX
  • simd
  • ray tracing
  • Crytek
  • Desarrollo de juegos
  • Procesadores Intel® Atom™
  • Procesadores Intel® Core™
  • Computación en paralelo
  • Dynamic Shift


    I am trying to achieve a dynamic shift. Well, let me explain the task. I process data with SSE, AVX. Data gets loaded, worked with and later results are stored. To support arbitrary lengths, I need some kind of maskload, but also for SSE.

    Suppose my lenght is 9 elements, I work with int32 and SSE. First load, second load is fine. Third load is fine from memory bound, this is no problem. But only element 0 in vector register is valid, others need to be zero. How do I achieve this best?

    Why is my AVX slower than SSE?

    As the description of "IIR Gaussian Blur Filter Implementation using Intel® Advanced Vector Extensions",

    The AVX should be faster than SSE,But, my result of performance measurement as following:

     The computer supports AVX
    number CPU in the system = 4

     IIR Gaussian Filter Coefficients are:
    a0 = 0.021175, a1 = -0.017807, a2 = 0.021103, a3 = -0.017875, b1 = -1.837578, b2
     = 0.844174, cprev = 0.510583, cnext = 0.489409

    image width = 1024, height = 1024

    Running multi threaded SSE code

    Running multi threaded AVX code

    Using the Emscripten* Compiler with the Intel® XDK


    Emscripten Compiles C and C++ to Javascript. This allows for running C and C++ programs with HTML5. Intel(R) XDK is an HTML5 Cross-platform Development Tool and provides an easy and fast way to get your apps to market. Emscripten Compiler and Intel XDK now gives you another option to publish apps using C and C++ as part of the application.


  • Desarrolladores
  • Android*
  • HTML5
  • C/C++
  • HTML5
  • Intermedio
  • html5 Intel XDK
  • Intel® Streaming SIMD Extensions
  • Herramientas de desarrollo
  • Suscribirse a Intel® Streaming SIMD Extensions