I am working with parallelising QCD codes to effectively use both the processors of a node.I found that, with few codes, -g option or -O0 (Supress all optmization) was very efficient (85%-100% gain in performance ie execution time), but the moment , i compile with -O2 otion and run, its only 15%-20% improvement in execution time. And again, this happens only with few programs and i hav other programs that prove to be good with both _o0 and -O2 level of optimization .
1. In some programs, One more observation with the above situation was, the scaling down factor in execution time of serial code from -O0 to -O2 was more tha openmp code , eq
serail : -g option 5 unit of time, -O2 option 2.6 unit of time
Openmp : -g option 2.5unit of time ,-O2 option 2 units of time.
So, openmp is still better than serial, -g is better than -O2 , but the improvement factor is less with -O2 option
2. I also have few programs that perform worse with -O2 option
Can somebody please explain the cause of such behaviour and a possible soln? Does the problem arise due to the code structure or architectural features like cache , bus bandwidth etc?
Does compiler optimization affect Openmp performance?