Documentation of SSE versions

Documentation of SSE versions


I am writing something about SSE and AVX. For AVX you find quite good documents here on the site. But what about SSE? It would be nice to have something that shows what different SSE versions contributed which kind of instructions. Not each instructions must be described a theoretical overview which names groups of together belonging would be enough.

// EDIT: For SSE4 I found:

Thanks in advance!

8 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

It all started with MMX really. It offers 64-bit integer vector operations. It reuses the x87 register stack so you can't mix MMX code with floating-point instructions.

SSE then added 128-bit floatint-point vector operations (and a couple more MMX instructions). It uses a separate register set.

SSE2 added 128-bit versions of the MMX operations, making MMX largely obsolete.

SSE3 mainly added 'horizontal' floating-point vector operations, useful for complex numbers. This is also when Intel switched from 64-bit execution units (which needed 2 cycles to process the 128-bit operations), to full width 128-bit execution units.

SSSE3 mainly added horizontal integer vector operations, and a generic byte shuffling instruction.

SSE4.1 added blend, min/max, rounding, sign/zero extension, a few instructions that filled some long-standing 'gaps', and a couple instructions aimed at video processing.

SSE4a is a set of only two AMD-specific instructions (they did not support SSE4.1 at that time).

SSE4.2 added string processing instructions.

AVX extends the SSE registers to 256-bit, and offers 256-bit floating-point operations. Other AVX instructions are still limited to 128-bit. Also, Sandy/Ivy Bridge lack the cache bandwidth to double the throughput in practice. AVX also introduced a new highly extendable instruction encoding format, called VEX.

Last but not least, AVX2 is a massive leap forward. It offers 256-bit integer operations, fused multiply-add, and to top it off, gather support! The Haswell architecture that will introduce it doubles the cache bandwidth, and it even adds more execution ports to free up the vector execution ports and improve Hyper-Threading performance.

There is no information on what comes after AVX2, but the Xeon Phi MIC uses 512-bit vector instructions which use an encoding format called MVEX, which might be a clue. I also have some suggestions of my own.

I nearly found what I wanted: Intel 64 and IA-32 Architectures Optimization Reference Manual

section: 2.10 SIMD Technology

Please also take a look at:

Forum topic: The dawn of the Intel SSE technology

If you need to go down to particular instructions availability, Intel Intrinsics Guide offers a great reference divided by SSE/AVX versions.

>>...But what about SSE? It would be nice to have something that shows what different SSE versions contributed which
>>kind of instructions...

Intel header files are one of the best sources for information you're looking for:
#include "intrin.h" // Definitions and Declarations for platform specific intrinsics
#include "mmintrin.h" // Definitions and Declarations for use with compiler intrinsics
#include "xmmintrin.h" // SSE
#include "emmintrin.h" // SSE2
#include "pmmintrin.h" // SSE3
#include "smmintrin.h" // SSE4.1
#include "nmmintrin.h" // SSE4.2
#include "tmmintrin.h" // Support for HPI
#include "wmmintrin.h" // Support for AES
#include "immintrin.h" // AVX
#include "zmmintrin.h" // 512-bit intrinsics / Note: Corrected / See Intel Parallel Studio XE 2013 for more details

#include "zmmintrin.h" // 512-bit intrinsics / There is No a Code Name of the instruction set in the header

Thanks the list with the different headers is quite helpful!

Leave a Comment

Please sign in to add a comment. Not a member? Join today