I am doing some tests to compare the performance of float vs. integer.All the test routines are very simple,i.e. combining two matrices together by simple arithmetic (add, mul or div). Despite the simplicity of the test, the results are puzzling me (given below). First and foremost, there is no difference between adding, multiplying and dividing two float matrices (cell-by-cell operation). It seems obvious to me that division should be way more expensive than addition, but just look at the results, the time for float operations is constant! 0.20, no matter what the arithmetic operation is?! What is going on? The code is attached if you want to test for yourself. I am testing on a P4 and a Core2Duo, with same results. The compiler settings are given below.
Secondly, division of short variables does not vectorize. Is that normal?
add:float : 0.203053
add:fixed : 0.209739
mul:float : 0.203633
mul:fixed : 0.214015
div:float : 0.203779
div:fixed : 0.506532
Thanks in advance for any advice/suggestions/comments.
compiler (through VC8): /GL /c /O3 /Og /Ob2 /Oi /Ot /Oy /GA /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /FD /MD /GS /GR /Fo"Release/" /W3 /nologo /Wp64 /Zi /Gd /Qansi-alias /Qvec-report2 /Qfp-speculationfast /QaxP /QxP
linker: /OUT:"D:CodeOptimisationRelease/Optimisation.exe" /INCREMENTAL:NO /nologo kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /MANIFEST /MANIFESTFILE:"ReleaseOptimisation.exe.intermediate.manifest" /DEBUG /PDB:"D:CodeOptimisationReleaseOptimisation.pdb" /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /qipo_fa /TLBID:1 /IMPLIB:"D:CodeOptimisationReleaseOptimisation.lib" /MACHINE:X86 /LTCG