You have existing code and you want to thread it so it runs best on a multicore system. What do you need to know to get started?
This article will show how you can start threading your code with OpenMP, help you decide where to thread, and show you how to measure the resulting code. I’ll also show some comparisons against typical Win32 threading.
An ever-increasing number of multicore processors are shipping today. As a result, developers need to add threads to their code to take advantage of multiple cores when they’re available, and split performance-sensitive code across those cores. However, your code must also scale well; the same code needs to run well on single-core machines, dual-core machines, quad-core machines and beyond. This paper looks at some common threading techniques using the OpenMP threading library, and measures their performance. This will give you some performance baselines to use, and help you understand how you can thread and measure your code.
OpenMP is a threading library that is used to write straightforward threaded code. It is often used to add threading to existing single-threaded code. I assume that you are familiar with threading concepts but that you may not have used OpenMP or threaded much code for performance.
Let’s look at some simple code, add in threading with OpenMP, and see how it performs on a 2-core system. This should give you a better understanding of how OpenMP performs, and also give a sense of how you could modify and benchmark your own code. For comparison, Win32 threading is shown as well.
All code is written in C++, and was compiled with Intel C++ compiler 9.0. Measurements were taken on Windows XP SP2. See the Configuration appendix for details on the test hardware.
In the samples shown here, OpenMP code scales well and has minimal performance degradation when forced to run in one thread. OpenMP also has similar overhead to Win32 threads. However, loop startup overhead is high for both OpenMP and Win32 threads. This suggests that threading via these mechanisms is not appropriate for very small loops or highly performance sensitive applications. Those cases need to thread with other mechanisms, like thread pools.
by Paul Lindberg, Senior Software Engineer, Intel Global Developer Relations Division
Why Should I Care About This Code Sample?
This sample code shows how to measure OpenMP basic threading performance.
This code takes a simple piece of serial code and threads it several different ways. It uses a test harness to measure this, and helps us understand the differences between the various methods.
C++ Developer considering (or already) using OpenMP to thread, who want to understand how that code will perform
Sample Category: Full project
Implementation Language: C++
Target Hardware & Software Platforms
Hardware Systems: Systems running Intel multi-core processors
Operating Systems: Windows XP and beyond
Compilers: Microsoft Visual Studio.NET 2003, Intel C++ Compiler 9.0