I am just a hobbyist programmer. My platform is a HP xw8400 workstation, with one XEON E5335 2.00GHz quad core processor. I use the Intel C++ Compiler Pro version 11.0.066 through a VS2008 IDE. I develop programs for native 64 bit execution hoping that is better for handling big data volumes fast. I use quite basic level of C++ language: mostly FOR loops, arrays and sorting.
Unfortunately or not, lately, my favorite self-developed application (a sort of mathematical analysis tool) has got too far into heavy processing. One single analysis takes close to a week of nonstop processing to complete.
My feeling is that performance can be improved. I read Intels paper of January 14, 2010, about Automatic Parallelization, and followed the instructions therein. Now the work-intensive part of my program contains no WHILE loops, no BREAKs and no GOTOs. True, some function calls remained to access the qsort() function of C. I wonder if it is too many: there are three sortings in the innermost loop. I have specified the /Qparallel option.
Parallelization and vectorization goes virtually smoothly, as the compiler always reports success - several pages of success-messages. Still, the execution time remains just a little (max. 5%) less than that for the .exe produced by the plain MS compiler. As a relatively long-time user on the Intel C++ compilers, I am quite disappointed, because in the earlier days (for my earlier programs_) I used to have 50% of time reduction. Or even more.
Beside slowness, peak CPU usage doesnt exceed 25%, and also that is concentrated mostly in one core of the quad core Xeon. (One core loaded nearly to 100%, and the remaining three just watching the first almost unloaded.) It doesnt look very much like a workload distributed evenly
I would greatly appreciate some advice from a colleague with deeper knowledge in this area. I am enclosing the tremendous command line records, which have been produced by the IDE as a result of my experimentation/guesswork upon the Property pages.
C/C++, Command line:
/c /O3 /Og /Ob2 /Oi /Ot /GT /Qip /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /EHsc /MD /GS /arch:SSE3 /fp:fast /Yu"StdAfx.h" /Fp"x64\\Release/F-PRINT_mp_AutoTest_5.pch" /Fo"x64\\Release/" /W3 /nologo /Zi /QaxSSE3 /QxSSE3 /Qparallel
Linker, Command Line:
kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /OUT:"C:\\Cpp_OwnCode\\Megoldasok\\F_Print\\F-PRINT_mp_AutoTest_5\\x64\\Release/F-PRINT_mp_AutoTest_5.exe" /INCREMENTAL:NO /nologo /MANIFEST /MANIFESTFILE:"x64\\Release\\F-PRINT_mp_AutoTest_5.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /TLBID:1 /DEBUG /PDB:"C:\\Cpp_OwnCode\\Megoldasok\\F_Print\\F-PRINT_mp_AutoTest_5\\x64\\Release\\F-PRINT_mp_AutoTest_5.pdb" /SUBSYSTEM:CONSOLE /STACK:100000000,100000000 /OPT:REF /OPT:ICF /DYNAMICBASE /NXCOMPAT /IMPLIB:"C:\\Cpp_OwnCode\\Megoldasok\\F_Print\\F-PRINT_mp_AutoTest_5\\x64\\Release\\F-PRINT_mp_AutoTest_5.lib" /MACHINE:X64