Basic OpenMP Threading Overhead


You have existing code and you want to thread it so it runs best on a multicore system. What do you need to know to get started?

This article will show how you can start threading your code with OpenMP, help you decide where to thread, and show you how to measure the resulting code. I’ll also show some comparisons against typical Win32 threading.

An ever-increasing number of multicore processors are shipping today. As a result, developers need to add threads to their code to take advantage of multiple cores when they’re available, and split performance-sensitive code across those cores. However, your code must also scale well; the same code needs to run well on single-core machines, dual-core machines, quad-core machines and beyond. This paper looks at some common threading techniques using the OpenMP threading library, and measures their performance. This will give you some performance baselines to use, and help you understand how you can thread and measure your code.

OpenMP is a threading library that is used to write straightforward threaded code. It is often used to add threading to existing single-threaded code. I assume that you are familiar with threading concepts but that you may not have used OpenMP or threaded much code for performance.

Let’s look at some simple code, add in threading with OpenMP, and see how it performs on a 2-core system. This should give you a better understanding of how OpenMP performs, and also give a sense of how you could modify and benchmark your own code. For comparison, Win32 threading is shown as well.

All code is written in C++, and was compiled with Intel C++ compiler 9.0. Measurements were taken on Windows XP SP2. See the Configuration appendix for details on the test hardware.

In the samples shown here, OpenMP code scales well and has minimal performance degradation when forced to run in one thread. OpenMP also has similar overhead to Win32 threads. However, loop startup overhead is high for both OpenMP and Win32 threads. This suggests that threading via these mechanisms is not appropriate for very small loops or highly performance sensitive applications. Those cases need to thread with other mechanisms, like thread pools.

OpenMP Experiments

by Paul Lindberg, Senior Software Engineer, Intel Global Developer Relations Division

Why Should I Care About This Code Sample?

This sample code shows how to measure OpenMP basic threading performance.

This code takes a simple piece of serial code and threads it several different ways. It uses a test harness to measure this, and helps us understand the differences between the various methods.

Target Audience
C++ Developer considering (or already) using OpenMP to thread, who want to understand how that code will perform

Sample Category: Full project

Implementation Language: C++

Target Hardware & Software Platforms

Hardware Systems: Systems running Intel multi-core processors

Operating Systems: Windows XP and beyond

Compilers: Microsoft Visual Studio.NET 2003, Intel C++ Compiler 9.0

Download code sample

Read complete article (PDF 374KB)


For more complete information about compiler optimizations, see our Optimization Notice.

1 comment

todd-bezenek's picture

Intel's TBBs eliminates many of these overheads.


Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.