Multi-core example with OpenMP slower than single core?

tim18
Total Points:
66,417
Status Points:
66,417
Black Belt
October 11, 2008 11:42 PM PDT
Rate
 
#2 Reply to #1

In the last example posted in this thread, I can't imagine why parallel sections would be used, rather than parallel do, nor why the inner loop would be designated for OpenMP parallel.  If threaded parallelism is required without any thought given to optimization, /Qparallel would be preferable, even though still not often effective.

As to the minimum problem size for effective OpenMP parallel, I have an example which achieves excellent threaded scaling on Core 2 Duo, when the non-threaded version takes only 1 millisecond.   Of course, this is an ideal case; the cache sharing is effective, as are the persistent threads left from a previous parallel region.  The Intel OpenMP run-time does show a reduced overhead, compared with the Microsoft and gnu libraries.

The basic point, that OpenMP parallelism will not have an advantage for a simple inner loop of length 1000, does apply to the posted case.



Intel Software Network Forums Statistics

8292 users have contributed to 31239 threads and 99116 posts to date.
In the past 24 hours, we have 10 new thread(s) 10 new posts(s), and 21 new user(s).

In the past 3 days, the most popular thread for everyone has been huge pages on linux? The most posts were made to Pipeline buffer between stages? The post with the most views is Very amusing...  Escalated as

Please welcome our newest member amirsam7