Has Fused Multiply-Add been already implemented on Nehalem? We have deployed a cluster of 2592 Nehalem core (2 socket X5560 / node).
On another note, are there separate floating-point execution units supporting the various SSE or MMX instructions ? Or are the same execution units with those used for the regular FP instructions ? As far as I can tell, Nehalem cores have 5 exec. unit 2 of which can carry out double-precistion FP calculation + a 3rd unit doing FPshuffles, correct ?
What is the "thoretical" Maximuym FLOPS performance of a Nehalem CORE running at 2.8GHz (no turbo) ?
Thanks for any info on these......