Intel OpenMp Performance

Intel OpenMp Performance

I ran a test program (OpenMP for nested 3 layer loops)  to test the performance of Intel OpenMP on Intel Xeon X5675 with 12 threads. The compiler version is 12.1. Here is the timing data:

           Running with 1 thread:     1.49 Second
                               2 threads:   0.82 Secend   -  90.8% speed increase compared with 1 thread
                               4 threads:   0.56 Second   -  66.5% speed increase compared with 1 thread
                               8 threads:   0.34  second   -  54.7% speed increase compared with 1 thread
                             12 threads:    0.30  second  -   only 41.4% speed increase compared with 1 thread

Is this normal for OpenMp performace in this case?

     
     

 

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

You see quite normal scalability of an OpenMP program.  Generally, the scalability highly depend on the program structure, it can be close to perfect theoretical scalability, or can be poor, or can be even negative (more threads cause longer times).  All these cases are normal for particular programs.  Typical example of a program with guaranteed negative scalability is when single parallel region consists of many critical sections with no actual parallel code.  Of cause the goal is usually to get as better scalability as possible, via reducing serialized code, reducing syncronizations, reducing data sharing, etc.

Regards,
Andrey

It looks you do not have enough work to load all 12 cores or work with the same memory in different threads at the same time that cause threads stalls.

--Vladimir

As you have a CPU with 6 cores, only 2 of which are fully independent, progressive scaling as you add threads will depend somewhat on which specific hardware contexts you employ. 

You may be able to get fairly good scaling up to 4 threads if you assign them to independent cores, using just 1 thread in each pair of cores which share cache access paths.  As Vladimir hinted, if you have sharing of memory between threads, you must assign adjacent threads to the same cache to improve scaling beyond 4 threads.  This also would delay cache capacity problems as you add multiple threads per core.

On these CPUs, a performance gain of 20% from 4 threads on independent cores to 6 threads properly affinitized to separate cores is typical.  Maximum performance with 8 threads would be expected with the 2 additional threads assigned to the 2 fully independent cores, taking account of data sharing. This CPU family is unique in the asymmetric arrangement where only a minority of cores have independent paths to cache.

If it is a floating point benchmark, performance gain from 2nd thread on a core is unlikely unless you use the IEEE divide and sqrt instructions heavily.

Quote:

TimP (Intel) wrote:

As you have a CPU with 6 cores, only 2 of which are fully independent, progressive scaling as you add threads will depend somewhat on which specific hardware contexts you employ.

You may be able to get fairly good scaling up to 4 threads if you assign them to independent cores, using just 1 thread in each pair of cores which share cache access paths.  As Vladimir hinted, if you have sharing of memory between threads, you must assign adjacent threads to the same cache to improve scaling beyond 4 threads.  This also would delay cache capacity problems as you add multiple threads per core.

On these CPUs, a performance gain of 20% from 4 threads on independent cores to 6 threads properly affinitized to separate cores is typical.  Maximum performance with 8 threads would be expected with the 2 additional threads assigned to the 2 fully independent cores, taking account of data sharing. This CPU family is unique in the asymmetric arrangement where only a minority of cores have independent paths to cache.

If it is a floating point benchmark, performance gain from 2nd thread on a core is unlikely unless you use the IEEE divide and sqrt instructions heavily.

^ this

#agreed

Leave a Comment

Please sign in to add a comment. Not a member? Join today