AFAIK Intel has not provided any documentation about the peak ALU throughput of each EU. My understanding is that each EU has 2 FPUs, each capable of 8 flops/cycle (4 MACs per cycle per FPU pipe). Thus I am assuming 16 flops/cycle for 1 EU. Any clarification/confirmation would be great.
I hope it is not confidential information. For example, AMD and Nvidia both declare this information.



