I'm currently testing Intel C++ 13.0 for our uses, but coming across some strange performance issues. Compiling some of our unit tests results in a 50% speed drop as opposed to MSVC++.
Using Vtune, enabled me to find some machine clear issues eminating from our hot loops. Looking in more detail at the assembler output revealed that the std::vector array access weren't being inlined, but instead using a function call!
Heres the assembler output, I am using /O2 /Qip /Oi I've also tried /Ox
000000013FBDE189 mov rcx,qword ptr [rbp+120h]
000000013FBDE190 mov edx,dword ptr [yx]
000000013FBDE193 add edx,dword ptr [x]
000000013FBDE196 movsxd rdx,edx
000000013FBDE199 call std::vector<float, std::allocator<float> >::operator (013FB76880h)
000000013FBDE19E mov r12,rax
000000013FBDE1A1 mov rcx,qword ptr [rbp+120h]
000000013FBDE1A8 mov edx,dword ptr [yx]
000000013FBDE1AB add edx,dword ptr [x]
000000013FBDE1AE movsxd rdx,edx
000000013FBDE1B1 call std::vector<float, std::allocator<float> >::operator (013FB76880h)
Any idea why the compiler is using a call instead of a direct array access here ? Its just a std::vector<float> object.
I've been trying loads of different compiler options to try and somehow get it inlined, but to no avail. Can anyone help ?