AVX-512 expectations

AVX-512 expectations

Announcement: http://software.intel.com/en-us/blogs/2013/avx-512-instructions

What isn't clear from this announcement is whether the future Xeon processor with AVX-512 support will actually be a socketed MIC, or a CPU (more precisely Skylake)? Is it coming to consumer CPUs in a similar timeframe? Developers might want to know, to determine whether to adopt AVX2+ or heterogeneous computing. The latter would benefit the competition more than it would benefit Intel.

30 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

Please look for information on the Internet regarding Intel Developer Forum ( IDF ) 2013 event in San Francisco. I think lots of technical details will be provided during the event and after.

It seems that combo Xeon cpu and Xeon Phi could be also named heterogeneous computing.

>>... The latter would benefit the competition more than it would benefit Intel.

Take a look at latest Intel press releases ( and statements from CEO ) and you will see that Intel is concerned regarding the situation with declining PC sales ( it gets worse and worse every 4 months ) and over-heated competition between Tablet manufacturers.

From the published roadmap (e.g. [1]) it looks like AVX-512 is another naming for AVX3.1, which will be present in Knights Landing boards. BTW, if that's correct then Intel should be more consistent about namings - these different flavours of AVX only add confusion for developers. The technology will be refined into AVX3.2 and then released in Skylake CPUs.

Disclaimer: I'm not an Intel representative. The information I have is taken from the public sources and is as reliable as these sources are. Use it at your own discretion.

[1] http://www.tomshardware.com/news/Skylake-Intel-DDR4-PCIe-SATAe,23349.html

Quote:

iliyapolak wrote:
It seems that combo Xeon cpu and Xeon Phi could be also named heterogeneous computing.

Indeed, but with a Xeon CPU that supports AVX-512, at least the ISA would be practically homogeneous. With a socketed MIC, it would also be homogeneous architecturally. That could work well for the HPC market.

Personally I'm most interested in consumer CPUs though. Consumer applications are typically real-time, which makes heterogeneous computing very hard due to the overhead of moving data around and scheduling work on different cores through various software layers. With a consumer CPU with AVX-512 support, we could instead instantly switch between scalar and vector code, while the data remains local, thus preserving bandwidth and power.

Since the announcement mentions Sandy/Ivy Bridge and Haswell as part of an evolution to AVX-512 that offers 8x FLOP/sec over (between?) 4 generations, it hints at AVX-512 coming to Skylake. It's not 100% explicit about that though. That said, as far as I know Skylake isn't even officially announced as the successor of Broadwell/Haswell refresh yet, so they might simply want to save that for IDF and be a little vague for now.

Quote:

Sergey Kostrov wrote:
Take a look at latest Intel press releases ( and statements from CEO ) and you will see that Intel is concerned regarding the situation with declining PC sales ( it gets worse and worse every 4 months ) and over-heated competition between Tablet manufacturers.

What are you implying from that regarding AVX-512? I can see two trains of thought: On the one hand AVX-512 could be considered too expensive for an ultra-mobile chip. Haswell's high-power performance is nearly stagnant (for scalar workloads), while massive progress has been made in the low-power department. Intel will have to continue that trend to dominate the market of tablets capable of running a full-fledged desktop O.S. (which is what NVIDIA appears to be aiming for as well with their recently presented mobile Kepler). Silvermont didn't get AVX(2) support, so AVX-512 could be a bridge too far for the architectural successor of Haswell as well.

On the other hand, AVX2 did not seem to significantly impede Haswell from entering the ultra-mobile space from above. For high DLP workloads, wide vectors definitely offer the best performance/Watt. So even for the mobile market it makes sense to support AVX-512, or they'll lose that to various flavors of GPGPU (mobile Kepler, HSA architectures, Mali, etc.). The latter all favor ARM, which is Intel's biggest competition in the mobile market. AVX-512 has to become ubiquitous for x86 and Intel to thrive.

Quote:

andysem wrote:
From the published roadmap (e.g. [1]) it looks like AVX-512 is another naming for AVX3.1, which will be present in Knights Landing boards.

As far as I know that's not a published roadmap but rather either a leaked slide or someone's speculation. Interestingly the original presentation (http://www.icsr.agh.edu.pl/~kito/Arch/arch1-1-4B-x86.pdf) mysteriously disappeared...

Quote:

BTW, if that's correct then Intel should be more consistent about namings - these different flavours of AVX only add confusion for developers. The technology will be refined into AVX3.2 and then released in Skylake CPUs.

I think naming it AVX-512 is an attempt at being more consistent. The lack of packed vector instructions for data types smaller than 32-bit shows that it's aimed exclusively at the SPMD programming model. AVX2's closely packed small data types remains somewhat orthogonal to that and is aimed at multimedia where such small data types are common (it is suitable for SPMD as well but lacks dedicated mask registers and such). AVX-512 can have various 'additional' extensions such as exponential and reciprocal instructions. So naming things AVX 3.1 and 3.2 could be more confusing when each support different additional instructions.

Whether or not the AVX-512 name really makes things clearer is debatable. The AVX2 'packed' instructions could be extended to 512-bit in the future, but would no longer be part of the 'AVX-512 foundation'. The SPMD instructions could eventually be extended to 1024-bit and thus be called AVX-1024. Some level of confusion is unavoidable I guess. In any case, low-level programmers (such as compiler developers) shouldn't have a big problem with that, and marketing just wants to stress the greatest width.

I think naming it AVX-512 is an attempt at being more consistent.

Well, we have too little information about the extensions to see that. From my mind, it is quite possible that AVX3.2 or whatever it is called in Skylake includes a new set of mask registers and operations that involve them. At the same time, it is also possible that some future instruction extensions will not widen registers and only add new operations (remember SSE-SSE4; also, how would you characterize AVX, which only operates on 256 bit with FP operations but not integer?). If it comes to AVX-512, AVX2-512, The New AVX-512 etc. then I would rather prefer AVX, AVX2, AVX3, AVX4 and so on, regardless of the register width. That, of course, regards to the single product line (general purpose CPUs, in particular). Whether or not names for Xeon Phi extensions match or differ from the GPCPU extensions does not really matter since they don't intersect in any way because are implemented by different classes of products (and physical devices). But, surely, different namings wouldn't hurt in this case and using any form of "AVX" in Xeon Phi extensions certainly doesn't help.

>>>but with a Xeon CPU that supports AVX-512, at least the ISA would be practically homogeneous. With a socketed MIC, it would also be homogeneous architecturally. That could work well for the HPC market.>>>

Yep that's true.

With the introduction of 512-bit width AVX extension newest CPU's could compete in terms of raw fp-processing power at single precision with even medium to high-end GPU's.

[ Andysem wrote ]

>>...
>>Disclaimer: I'm not an Intel representative. The information I have is taken from the public sources and is as reliable
>>as these sources are. Use it at your own discretion.
>>
>>[1] http://www.tomshardware.com/news/Skylake-Intel-DDR4-PCIe-SATAe,23349.html
>>...

I wouldn't consider it as "actual Intel's roadmap" because a title of the article is Intel Roadmap Post-Haswell Rumour. Also, there is a strange gap between Haswell AVX2 ( 2013 ) and Skylake AVX3.2 ( Future ). So, where are AVX3.0 and AVX3.1 gone? There is just one Broadwell architecture between Haswell and Skylake architectures.

>>...[1] http://www.tomshardware.com/news/Skylake-Intel-DDR4-PCIe-SATAe,23349.html

And one more comment on a correctness of the Intel's roadmap pictured: According to the picture ( see above web-link ) Haswell will be released in 2014...

Quote:

Sergey Kostrov wrote:
I wouldn't consider it as "actual Intel's roadmap" because a title of the article is Intel Roadmap Post-Haswell Rumour. Also, there is a strange gap between Haswell AVX2 ( 2013 ) and Skylake AVX3.2 ( Future ). So, where are AVX3.0 and AVX3.1 gone? There is just one Broadwell architecture between Haswell and Skylake architectures.

And one more comment on a correctness of the Intel's roadmap pictured: According to the picture ( see above web-link ) Haswell will be released in 2014...

Both have perfectly good explanations. The slide shows that Knights Landing supports AVX3.1. The official AVX-512 documentation clarifies that there's a "foundation" of 512-bit instructions, and the possibility of several additional extensions. So the foundation could probably be designated as AVX3.0, and different sets of additional extensions make it AVX3.1 and AVX3.2. If no device supports just the foundation, then AVX3.0 appears to be skipped. A somewhat similar thing happened to SSE4.1 / SSE4.2 / SSE4a.

This is a Xeon roadmap, which explains why Haswell won't be launched until 2014. Note that it is also claimed to support DDR4, which isn't true for currently released desktop/mobile Haswell designs, but matches previous rumours about DDR4 support in the server space.

So in my personal opinion these things don't make this leaked slide less credible.

I will follow up on that subject after IDF 2013 event in San Francisco.

>>...That said, as far as I know Skylake isn't even officially announced as the successor of Broadwell/Haswell refresh yet, so they
>>might simply want to save that for IDF and be a little vague for now...

This is simply to let you know that Keynotes for IDF 2013 already released and ( I was not surprised ) a word Mobility was used in a Message Subject.

Intel® AVX-512 will debut with the many core product code-named "Knights Landing".

Intel AVX-512 will be supported in future Intel® Xeon® processors.

Quote:

Sergey Kostrov wrote:
I will follow up on that subject after IDF 2013 event in San Francisco.

Any details on AVX-512 products being presented there?

As Sergey said this year IDF was mainly about the mobile platforms.

I attended all those few vector related classes and nothing has been said about the AVX-512.

Hi Gregg

Thanks for the explanation but Xeon Phi already has support for 512-bit wide vectors in VPU  unit.So Knights Landing will have another subset of 512-bit wide vector instructions?

Quote:

iliyapolak wrote:
As Sergey said this year IDF was mainly about the mobile platforms.

I attended all those few vector related classes and nothing has been said about the AVX-512.

That's too bad. I understand Intel's current focus on mobile platforms, but the battle between heterogeneous and homogeneous computing is an important one too with huge long-term implications. In my opinion AVX-512 is even a power efficiency feature for high DLP workloads, which will be relevant to the mobile market as well. HSA is a serious contender and could increase the competition from ARMv8 unless developers and system designers have a reason to stick with x86 as their architecture of choice for all computing needs. The best way achieve that is to have a unified x86 architecture which can extract high levels of ILP, DLP and TLP in a compiler-friendly way (thus homogeneous).

Quote:

iliyapolak wrote:
Thanks for the explanation but Xeon Phi already has support for 512-bit wide vectors in VPU  unit.So Knights Landing will have another subset of 512-bit wide vector instructions?

Unlike the MVEX encoding of current Phi architectures, the EVEX encoding used by AVX-512 allows for other widths than 512-bit. This makes it compatible with AVX-256 and SSE. This strongly hints at AVX-512 coming to a successor of Haswell. The announcement even mentioned Sandy Bridge and Haswell as part of an evolution toward 8X peak FLOP/sec. Still, it only truly confirms that AVX-512 will be feautured in Knight's Landing and some future Xeon processors, which leaves some room for uncertainty especially in the consumer market. I hope that gets clearer up soon.

I confused MIC VPU instructions with Xeon AVX which are different probably at front end machine code level.Newest wider 512-bit extension will probably be able to reduce the raw FLOP/sec  speed difference when compared to todays GPUs. It will be quite interesting to vectorize custom math functions library to work on 8 double precision vector.

 

>>>In my opinion AVX-512 is even a power efficiency feature for high DLP workloads>>>

But at cost of more transistors and thus more gate logic needed to implement new instructions.

Quote:

iliyapolak wrote:
>>>In my opinion AVX-512 is even a power efficiency feature for high DLP workloads>>>

But at cost of more transistors and thus more gate logic needed to implement new instructions.

Not really. You only need half the number of cores for the same (peak) throughput. So it saves lots of transistors for pefectly vectorizable workloads. Even more typical code contains many loops with independent iterations that can be vectorized, and in recent years many new parallel algorithms are being developed. So a balance is required between scalar and vector processing for the optimal average performance/transistor and performance/Watt. AVX-512 would definitely help improve that. The cost is fully compensated by the average gains.

Also note that 14 nm technology and beyond greatly increases the transistor budget, so no compromise has to be made. Any other way those transistors would be spent is not likely to offer the same net benefit.

Regarding half cores needed to do the same job that is true.Bear in mind that some newest machine code instructions while beign implemented at hardware level by micro ops could have use more cpu resources like adders or multipliers or the other logical units and thus consume more power also while working on larger vectors at some small time presumably measured in single clock cycle more energy could be disipated because more raw data needs to be operated on.

 

 

Forgotten to add that greater amount of transistors generates more heat dissipation.So it all about finding the proper balance between higher computational power as a function of heat dissipation and trying to minimize the heat being generated.

I agree with c0d1f1ed.  The transistor budget is there and you have more opportunity than just adding cores.  If you vector wider, you mitigate the power of handling a uop, you have 1 cache, 1 fetch unit, 1 uop cache, 1 LS, which has 1 LDQ, etc.  Going wider, so long as you can use it which is up to the app and the ISA.. and the compiler used to leverage it, makes sense.  That's preferred rather than having many cores with duplicate power burned for no return.  Lastly, the performance of scalar code is critical, since most "highly vectored" apps average 30-45% vector code, you need to have good scalar code performance because algorithms aren't all just sitting in some highly vectorizable loop.  There's transitions, conditional control flow, etc. which necessitate a well rounded perfomrance scheme.  Just my 2 cents..

Perfwise

It could be interesting to do comparison in term of heat dissipation or energy needed to perform for example trigonometrical calculation by hardware implemented algorithm (vectorised version of scalar fsin machine code instruction) and the same algorithm implemented with AVX-512 instructions.My guess is that at the front end stage cost in cycles to fetch and decode up to 6 or 7 terms of Horner Scheme could be greater than decoding one complex instructions.

>>Any details on AVX-512 products being presented there?

No.

Even if it is Not related to the subject of the thread here is a very short list of different expressions I remember:

- 7nm technology by 2017 year
- Haswell goes mobile
- SoC ( System On Chip )
- Quark
- Two-in-One solutions ( Desktop / Laptop / Tablet / Mobile ) Note: This is a compromise between PC and Tablet systems
- Processing very large Data Sets ( petabytes and so on )

I didn't hear anything about MKL and IPP.

How the  lithography progressed very quickly.I still remember 130 nm chips.

Lascia un commento

Eseguire l'accesso per aggiungere un commento. Non siete membri? Iscriviti oggi