Haswell New Instructions posted

Haswell New Instructions posted

Imagen de Mark Buxton (Intel)

A full specification for the Haswell (2013) new instructions was just posted to the programmer's reference manual at
http://software.intel.com/file/36945. A blog will be coming shortly.

-Mark Buxton

publicaciones de 15 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.
Imagen de sirrida

Congratulations!
I'm very happy to see that most of the integer commands have been promoted to YMM. This essential for the graphics programming we do. AVX2 will surely be a big push for us.

The commands from the BMI groups will for sure become handy when used in a compiler, especially JIT.

The new PDEP and PEXT will for sure cost some silicon. I'd like to see them acting on XMM and YMM registers too, preferably with an adjustable granularity; it does not matter if there is only one such unit per die.

Unfortunately I'm not happy with the promoted vpshufb and palignr because they cannot operate cross-lanes. It will become difficult to e.g. convert an array of RGB pixels (SoA) to AoS. I also sourly miss a gather command for bytes and words, see my example (Lab color correction) in this forum.

Imagen de c0d1f1ed
That's incredible! Introducing support for gather, 256-bit integer operations, and shifts with independent count per element all at once exceeds my expectations. Congratulations to the people involved in deciding to take on the major task of providing a vector equivalent of every scalar instruction, and the engineers that make it happen. I believe this will mark a major turning point in the history of computing.

Just to be clear, will Haswell support both FMA and AVX2?

Imagen de c0d1f1ed
Quoting sirrida Unfortunately I'm not happy with the promoted vpshufb and palignr because they cannot operate cross-lanes.

Those things just aren't feasible. A 256-bit shuffle unit takes four times more area than a 128-bit one. It simply doesn't scale well to wider vectors. Note that AVX can widen to 512 and 1024-bit in the future, so it was necessary to keep things divided into manageable chunks. I think 128-bit lanes is a great compromise. Also note it's quite possible that 256-bit integer operations might actually be executed as two 128-bit parts, hence cross-lane operations also aren't easily possible. Frankly I'm quite thrilled though to get such a complete instrution set with AVX2.

I'm more curious about what will happen to the IGP. A mainstream 8-core Haswell with FMA could deliver 1 TFLOP of computing power. Compare that to Ivy Bridge's IGP (also at 22 nm) which may not achieve more than 200 GFLOPS. It doesn't make sense to waste a lot of die area on a more powerful IGP. Instead, they could just use Larrabee's software rendering technology on the CPU cores. The only major issue I can see is high power consumption from the out-of-order execution. That can be solved by executing 1024-bit operations on 256-bit or 128-bit execution units, but support for wider registers hasn't been announced yet. Perhaps Haswell will be a sort of hybrid, with a simple IGP assisted by the CPU...

Imagen de ange4771

small typo, twice: aesecnlast

edit: this post wasn't intended to be a reply to #3, but I can't delete it. the typo is in the official PDF document

Imagen de randombit

Shouldn't the _pdep_u64 and _pext_u64 intrinsics take a 64-bit mask? (Pages 7-19 and 7-21)

Imagen de gligoroski

Are the new Haswell instructions supported by Intel Software Development Emulator?
Intel Software Development Emulator

Imagen de TimP (Intel)
Quoting gligoroski Are the new Haswell instructions supported by Intel Software Development Emulator?
Intel Software Development Emulator

Not yet, Mark is the expert on this.

Imagen de Mark Charney (Intel)

Actually, the Intel SDE supporting the Haswell new instructions has not been released yet. Hopefully soon. No promises on when though.

Imagen de bronxzv

and what aboutthe FMA instructions disclosed previously (and with a distinct feature flag)?are they already supportedin SDE or not?

thanks to let us know

Imagen de Mark Charney (Intel)

The FMA instructions are present & supported in the currently downloadable version of the emulator.

Imagen de bronxzv

The FMA instructions are present & supported in the currently downloadable version of the emulator.

neat! thanks for your quick feedback, it will allow to validate the FMA path far ahead of the final hardware, still waiting for a supporting compiler though

Imagen de sirrida

Unfortunately many crucial commands of avx2 are not extended the natural way from mm => xmm => ymm.
They act as if there are 2 xmm registers in one ymm (slices) instead of acting cross-lanes.
This will make porting difficult.
Here are some examples:

  1. pack/punpck
  2. pshufb
  3. palignr
  4. Horizontal ops, e.g. phadd

I sincerely hope that this will be changed soon and before the emulator for avx2 implements the current specs.

Imagen de gilgil

What about thehalf precisionfloat vectors instructions (fp16) ?
Will they be implemented in the upcoming ivy-bridge or should we wait further ?

Imagen de c0d1f1ed
Quoting gilgil What about thehalf precisionfloat vectors instructions (fp16) ?
Will they be implemented in the upcoming ivy-bridge or should we wait further ?

Yes, Ivy Bridge will support them.

http://software.intel.com/en-us/blogs/2011/06/13/haswell-new-instruction-descriptions-now-available/

"These build upon the instructions coming in Intel microarchitecture code name Ivy Bridge, including the digital random number generator, half-float (float16) accelerators, and extend the Intel Advanced Vector extensions (Intel AVX) that launched in 2011."

Inicie sesión para dejar un comentario.