Parallel on Xeon

Parallel on Xeon

dp_phy's picture

Hi,

I have ifc 7.0 and I compiled some code using the -openmp and -parallel options in order to make use of my dual processor Xeon machine. I set OMP_NUM_THREADS to 4 because on "top" I see four cpu's. The results were: during execution, the user cpu is around 65% (I am assuming this is in the parallel regions) and distributed more or less equally among all four cpus. Occasionally, the usage goes to 100% , which is the usual case without any parallelization directives. The problem is this: Rather than having each cpu reading 15% I want them to read 100% (or even 50%) in order to get an improvement in my program's performance.

Does anyone know what to do? Should I switch to version 8 of the compiler?

Thanks!

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Tim Prince's picture

There are so many possibilities that you may want to go to the Threading forum, after you have gathered more data. Among the first things to look at:


Does your application become starved for RAM as you increase the number of threads?


Did your compilation report successful parallelization of all parts of your program where much time is spent? Use the openmp_report and parallel report switches.


Does your application balance load evenly among threads? If not, can the OpenMP scheduling options help? Seeing all 4 logical CPU's about equally loaded doesn't necessarily prove anything, a single thread could be hopping around excessively.


Diagnosing opportunities to gain more from OpenMP is notoften simple. Intel Threading Toolkit is one of the methods being undertaken to help.


On an HT system, getting the performance meter up to 100% doesn't necessarily show effective parallelization. You may need to shut HT off in the BIOS and check to see how effectively your application parallelizes on 2 CPU's. You must get it working well that way before you can hope for additional gain from turning on HT.


How much additional gain you could get from HT depends on many factors, such as:


which kernel


which Intel chip


how your program uses cache and Write Combine buffers


Login to leave a comment.