I have written a program whcih relies a lot on array assignments and operation among other things, so a lot of the DO loops are parallelizable; actually I have written some of them as DO CONCURRENT for that matter. However I am facing a situation where, the program compiled with auto-parallelization runs considerably slower than the one compiled without.
Strangest thing about it is that in my system monitor, when the auto-parallelized program runs, I see all cores being indeed busy to maximum, and the temperature of all cores are rising fast, which means there is work done. But if that is so, not only the program takes longer, but it does at least 8 times more work (more or less) too.
I speculate this might be an artifact of bad "candidate for paralellization" code, but I assume the compiler can resolve all problems related to parallelizing. It sitll buffles me however, and because I suspect this is a matter of lack of knowledge from my side, I am reluctant to post any details yet. Am I missing something here? Please advise as to what details might be needed if the matter is not straight-forward.
Thanks in advance!
CPU: Intel(R) Core(TM) i7 CPU 930 @ 2.80GHz
System: kubuntu 13.04