After reading so much stuff about how Intel C++ is better then others I decided to test it (I have some real code to optimize)
I was trying many options combinations (O1,O2,Ox, SSE2,SSE3, SSE4.1,SSE4.2, data alignment, IPO, auto parallelization with loop level set and not set) . I have C0x enabled, and I am using restrict keyword for Intel (Visual does not recognize it). I have optimization diagnostic level set to 3. I am compiling for X64.
And, after a dozen or so checks I can say that Intel is running 0.3 fps (which is about 4.8%) slower than Visual.
Auto-parallelizer actually makes things slower than linear (half slower to be exact). I think it is because my functions are small but called very often.
Obviously OpenMP had similar performance to auto-parallerizer.
gcc 3.4.6 is about 30% slower. But I will do the tests also on gcc 4.x.x and Open64.
Do you have any ideas what else could improve performance? Or why is it working still slower than Visual?
I am using Intel v.11 and Visual Studio 2005.
Thanks for help.