Documentation of SSE versions

Documentation of SSE versions

imagem de Christian M.

Hello,

I am writing something about SSE and AVX. For AVX you find quite good documents here on the site. But what about SSE? It would be nice to have something that shows what different SSE versions contributed which kind of instructions. Not each instructions must be described a theoretical overview which names groups of together belonging would be enough.

// EDIT: For SSE4 I found: http://software.intel.com/sites/default/files/m/9/4/2/d/5/17971-intel_20...

Thanks in advance!

8 posts / 0 new
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.
imagem de c0d1f1ed

It all started with MMX really. It offers 64-bit integer vector operations. It reuses the x87 register stack so you can't mix MMX code with floating-point instructions.

SSE then added 128-bit floatint-point vector operations (and a couple more MMX instructions). It uses a separate register set.

SSE2 added 128-bit versions of the MMX operations, making MMX largely obsolete.

SSE3 mainly added 'horizontal' floating-point vector operations, useful for complex numbers. This is also when Intel switched from 64-bit execution units (which needed 2 cycles to process the 128-bit operations), to full width 128-bit execution units.

SSSE3 mainly added horizontal integer vector operations, and a generic byte shuffling instruction.

SSE4.1 added blend, min/max, rounding, sign/zero extension, a few instructions that filled some long-standing 'gaps', and a couple instructions aimed at video processing.

SSE4a is a set of only two AMD-specific instructions (they did not support SSE4.1 at that time).

SSE4.2 added string processing instructions.

AVX extends the SSE registers to 256-bit, and offers 256-bit floating-point operations. Other AVX instructions are still limited to 128-bit. Also, Sandy/Ivy Bridge lack the cache bandwidth to double the throughput in practice. AVX also introduced a new highly extendable instruction encoding format, called VEX.

Last but not least, AVX2 is a massive leap forward. It offers 256-bit integer operations, fused multiply-add, and to top it off, gather support! The Haswell architecture that will introduce it doubles the cache bandwidth, and it even adds more execution ports to free up the vector execution ports and improve Hyper-Threading performance.

There is no information on what comes after AVX2, but the Xeon Phi MIC uses 512-bit vector instructions which use an encoding format called MVEX, which might be a clue. I also have some suggestions of my own.

imagem de Christian M.

I nearly found what I wanted: Intel 64 and IA-32 Architectures Optimization Reference Manual

section: 2.10 SIMD Technology

imagem de Sergey Kostrov

Please also take a look at:

Forum topic: The dawn of the Intel SSE technology
Web-link: software.intel.com/en-us/forums/topic/279703

imagem de andysem

If you need to go down to particular instructions availability, Intel Intrinsics Guide offers a great reference divided by SSE/AVX versions.

http://software.intel.com/en-us/articles/intel-intrinsics-guide

imagem de Sergey Kostrov

>>...But what about SSE? It would be nice to have something that shows what different SSE versions contributed which
>>kind of instructions...

Intel header files are one of the best sources for information you're looking for:
...
#include "intrin.h" // Definitions and Declarations for platform specific intrinsics
#include "mmintrin.h" // Definitions and Declarations for use with compiler intrinsics
#include "xmmintrin.h" // SSE
#include "emmintrin.h" // SSE2
#include "pmmintrin.h" // SSE3
#include "smmintrin.h" // SSE4.1
#include "nmmintrin.h" // SSE4.2
#include "tmmintrin.h" // Support for HPI
#include "wmmintrin.h" // Support for AES
#include "immintrin.h" // AVX
#include "zmmintrin.h" // 512-bit intrinsics / Note: Corrected / See Intel Parallel Studio XE 2013 for more details
...

imagem de Sergey Kostrov

Correction:
...
#include "zmmintrin.h" // 512-bit intrinsics / There is No a Code Name of the instruction set in the header
...

imagem de Christian M.

Thanks the list with the different headers is quite helpful!

Faça login para deixar um comentário.