State of AVX 512 on Skylake-X

State of AVX 512 on Skylake-X

As has been stated on a number of review sites, AVX 512 performance on the 6/8 core Skylake-X is compromised.
Only on the 10 core, the present hardware is fully enabled.
Would Intel be so kind as to provide in depth detail of what the performance difference means ?
From the vague information available it seems one of 2(3?) AVX 512 ports is disabled (port 5).
Can we get more detailed information, which ports are used for AVX 512 ?
What AVX 512 instructions can the ports execute, do they have 512-bit data paths to registers/cache ?
How is AVX 512 gather affected regarding the 6/8 core versus 10 core ?
A similar drawing as below for AVX2 would be appreciated.

 

 

 

Thread Topic: 

Question
3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I'm not an Intel representative, but this is how I understand the article. The 6- and 8-core models have one of the two FMA units disabled (the one connected to Port 5), thus FMA instructions only having half the throughput of the 10-core model. One 512-bit register contains 8 DP FP elements, so from the article it follows that FMA instructions have reciprocal throughput of 0.5 on 6- and 8-core models and 0.25 on the 10-core model.

Ports 0, 1 and 5 are all enabled on all Skylake-X CPU models. Ports 0 and 1 are used for most 256-bit vector instructions and can fuse together to issue a 512-bit vector instruction (i.e. to execute the same 256-bit instruction on the two 256-bit lanes). Port 5 is 512-bit and can also issue 512-bit vector instructions. It is additionally used for cross-lane operations, such as shuffles. On the 10-core CPU its is also used for the second FMA unit.

Apparently, what follows from this is that most of the 512-bit instructions should have at most the 2/3 throughput compared to the corresponding 256-bit counterparts. But I have not seen any numbers yet to confirm that.

 

Some people that have bought the 7800x now claim, based on benchmarks, both FMA 512 units are enabled on the 6 core.
Can somebody from Intel please confirm this ?
 

 

Leave a Comment

Please sign in to add a comment. Not a member? Join today