Intel® Advanced Vector Extensions

Instruction set extensions programming reference, revision 18

In early February, an updated instruction set extensions programming reference, revision 18, has been posted here. 

It includes information about:

  • Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions
  • Intel® Secure Hash Algorithm (Intel® SHA) extensions 
  • Intel® Memory Protection Extensions (Intel® MPX) 

For more information about the technologies: http://www.intel.com/software/isa

Updated Intel® Software Development Emulator

Hello, we just released version 6.20 of the Intel® Software Development Emulator. It is available here:http://www.intel.com/software/sde

It includes:

  • Added support for XSAVEC and CLFLUSHOPT.
  • Disabled TSX CPUID bits when TSX emulation is not requested.
  • Improved disassembly for MPX instructions.
  • Added an option for running chip-check only on the main executable.
  • Added support for -quark (Pentium ISA).
  • Added application debugging for Mac OSX with the lldb debugger.

Problem when using RTM

Hello,

My name is George Mappouras and I am trying to make a simple program in order to evaluate the TSX in the new Haswell processors. However I came across a very strange problem that I can't find its cause and I was wondering if you could help me with it.

asm blocks

Hello,

I am writing AVX code inside asm blocks (don"t want to use avx intrinsics).

A lot of gp registers are used and so they are mixed with the ones generated by the compiler and thus it is screwing the behavior of the code pretty fast.

Is there an automatic or manual way to avoid these register overlaps ?

Any link to documentation would be great.

I would like also to use asm blocks in fortan with ifort, but didn't find the way yet.

Thanks

Vincent

SDE produces unstable behavior

Hi,

I have some SSE/AVX code that I'm trying to test with Intel Software Development Emulator (SDE) on CPUs without the native support for some of the instruction set extensions. In particular, I tried the following setups:

1. Sandy Bridge CPU, SDE is running with -hsw switch.

2. Sandy Bridge CPU, SDE is running with -hsw -sse-sde switches.

3. A KVM guest virtual machine with SSE4 instructions (host CPU is Nehalem), SDE is running with -hsw switch.

All this is on Linux x86_64, SDE 6.22 and 6.12.

Disabling AVX

Hi all,

Is there a way (under Windows 7) to disable the support of AVX.
I wan't to make sure that on a pre-SB machine I don't get "Illegal Instruction exception".
Currently I have to use another machine and it's a bit annoying.

I don't use the /QaXXX flags because the code is already taking very long to compile (so it's taken care of manually) and I want it to work on Microsoft compiler too (even if performances would be degraded of course ;) ).

The Chronicles of Phi - part 5 - Plesiochronous phasing barrier – tiled_HT3

For the next optimization, I knew what I wanted to do; I just didn’t know what to call it. In looking for words that describes loosely-synchronous, I came across plesiochronous:

In telecommunications, a plesiochronous system is one where different parts of the system are almost, but not quite, perfectly synchronized.

BKMs on the use of the SIMD directive

We had an ask from one of the various “Birds of a Feather” meetings Intel® holds at venues such as at the Super Computing* (SC) and International Super Computing* (ISC) conferences. The customer wanted to know BKMs (Best Known Methods) on the proper usage of the new OpenMP* 4.0 / Intel® Cilk™ Plus SIMD directive. I volunteered to create such a list. Investigating the topic more thoroughly, I discovered that there is already a vast amount of resources on vectorization and the use of the SIMD directive.

The Chronicles of Phi - part 4 - Hyper-Thread Phalanx – tiled_HT2

The prior part (3) of this blog showed the effects of the first-level implementation of the Hyper-Thread Phalanx. The change in programming yielded 9.7% improvement in performance for the small model, and little to no improvement in the large model. This left part 3 of this blog with the questions:

What is non-optimal about this strategy?
And: What can be improved?

There are two things, one is obvious, and the other is not so obvious.

Data alignment

FMA Support

Hello guys, sorry for a basic question. I've been looking for architectures which supports FMA. I know Sandy Bridge doesn't support, and Haswel supports it. But, what about Ivy Bridge? Does Ivy Bridge supports FMA?

Best regards.

Assine o Intel® Advanced Vector Extensions