two questions to the compiler team:
1) As many people have found, in ICC5 and ICC6 on many pieces of optimized code -G7 option gives worse performance (a few percent) than -G6. It has been tested on numerous code and P4s, both Willamette and Northwood; -G7 is never measurably better than -G6.
Can anyone elaborate on what -G7 option really does? Please, I have read the manual, so I'd like to know more than "it optimizes your application to use as many of the features as possible of the processor you specify without making it incompatible with earlier processors."
Spec., what makes (or can make) -G7 run slower than -G6 on P4s?
2) a) Can you explain why __m128, __m128i and __m128d are not compatible? What's more, they cannot be type-cast into one another. As intrinsics are not ambiguous, and the xmm register set is one for all of them, what was the reason for that?
b) Is it because of future compatibility with hypothetical xmm register set that would be split into 8ps, 8pd and 8epi regs?
c) Is this... feature... present in ICC 7, too?
BTW, thanks for the ICC, best-of-the-best-of-the-best.