floating-point performance?!

floating-point performance?!

Hi all,

I am doing some tests to compare the performance of float vs. integer.All the test routines are very simple,i.e. combining two matrices together by simple arithmetic (add, mul or div). Despite the simplicity of the test, the results are puzzling me (given below). First and foremost, there is no difference between adding, multiplying and dividing two float matrices (cell-by-cell operation). It seems obvious to me that division should be way more expensive than addition, but just look at the results, the time for float operations is constant! 0.20, no matter what the arithmetic operation is?! What is going on? The code is attached if you want to test for yourself. I am testing on a P4 and a Core2Duo, with same results. The compiler settings are given below.

Secondly, division of short variables does not vectorize. Is that normal?

$ ./Optimisation.exe
add:float : 0.203053
add:fixed : 0.209739
add:fixed16: 0.104484
mul:float : 0.203633
mul:fixed : 0.214015
mul:fixed16: 0.107370
div:float : 0.203779
div:fixed : 0.506532
div:fixed16: 1.062533

Thanks in advance for any advice/suggestions/comments.



compiler (through VC8): /GL /c /O3 /Og /Ob2 /Oi /Ot /Oy /GA /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /FD /MD /GS /GR /Fo"Release/" /W3 /nologo /Wp64 /Zi /Gd /Qansi-alias /Qvec-report2 /Qfp-speculationfast /QaxP /QxP

linker: /OUT:"D:CodeOptimisationRelease/Optimisation.exe" /INCREMENTAL:NO /nologo kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /MANIFEST /MANIFESTFILE:"ReleaseOptimisation.exe.intermediate.manifest" /DEBUG /PDB:"D:CodeOptimisationReleaseOptimisation.pdb" /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /qipo_fa /TLBID:1 /IMPLIB:"D:CodeOptimisationReleaseOptimisation.lib" /MACHINE:X86 /LTCG

Downloadtext/x-c++src FloatVsFixed3.cpp8.93 KB
1 post / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.