State of AVX 512 on Skylake-X

State of AVX 512 on Skylake-X

As has been stated on a number of review sites, AVX 512 performance on the 6/8 core Skylake-X is compromised.
Only on the 10 core, the present hardware is fully enabled.
Would Intel be so kind as to provide in depth detail of what the performance difference means ?
From the vague information available it seems one of 2(3?) AVX 512 ports is disabled (port 5).
Can we get more detailed information, which ports are used for AVX 512 ?
What AVX 512 instructions can the ports execute, do they have 512-bit data paths to registers/cache ?
How is AVX 512 gather affected regarding the 6/8 core versus 10 core ?
A similar drawing as below for AVX2 would be appreciated.




Thread Topic: 

11 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I'm not an Intel representative, but this is how I understand the article. The 6- and 8-core models have one of the two FMA units disabled (the one connected to Port 5), thus FMA instructions only having half the throughput of the 10-core model. One 512-bit register contains 8 DP FP elements, so from the article it follows that FMA instructions have reciprocal throughput of 0.5 on 6- and 8-core models and 0.25 on the 10-core model.

Ports 0, 1 and 5 are all enabled on all Skylake-X CPU models. Ports 0 and 1 are used for most 256-bit vector instructions and can fuse together to issue a 512-bit vector instruction (i.e. to execute the same 256-bit instruction on the two 256-bit lanes). Port 5 is 512-bit and can also issue 512-bit vector instructions. It is additionally used for cross-lane operations, such as shuffles. On the 10-core CPU its is also used for the second FMA unit.

Apparently, what follows from this is that most of the 512-bit instructions should have at most the 2/3 throughput compared to the corresponding 256-bit counterparts. But I have not seen any numbers yet to confirm that.


Some people that have bought the 7800x now claim, based on benchmarks, both FMA 512 units are enabled on the 6 core.
Can somebody from Intel please confirm this ?


Got myself a 7820X.
I can confirm it has both FMAs enabled in AVX 512.
Thanks for the clear communication Intel !


Fortunately this information is included in the Intel ARK entries for the server parts.  For example, the Xeon Platinum 8160 description at includes

# of AVX-512 FMA Units                2

This is the correct answer for this processor.  

In general, the Platinum series processors and the Gold 6000 series processors all have 2 FMA units, and the other processors have 1 FMA unit.  I know of at least one exception -- the Gold 5122 has 2 FMA units.   I don't know if there are other exceptions -- there are 58 processor models and the number of FMA units is not a field that can be used with the advanced search function.

"Dr. Bandwidth"

Thanks for the update Jan. Wish Intel would respond, more information would be nice.

In case you have one of those Skylake-X processors, and want to find out if it has 2 AVX 512 FMAs.
Here a real time AVX2 / AVX512 / GPU Julia/Mandelbrot zoomer:
All computations done with double precision. Very much optimized with FMA computations and multi-threading.
You can switch from AVX512 to AVX2. If you notice a big difference in frames per second you can assume to have 2 AVX512 FMAs
Computation speed is up to 60 FPS at 4K resolution on an 8 core running at 4 Ghz using AVX512.

As John already indicated, the AVX-512 unit count is provided for all of the parts enumerated on

Information about Xeon is totally useless if the question is about information for Skylake-X.

No information about nr AVX 512 units for Skylake-X as you can see.

So it took Intel about 1 year to add the correct information of 2 AVX-512 FMA units for Skylake-X.
Congratulations !

Leave a Comment

Please sign in to add a comment. Not a member? Join today