Threading Models for High-Performance Computing: Pthreads or OpenMP?

by Andrew Binstock


The UNIX operating system has supported threads for many years, which is one of the principal reasons UNIX has flourished on server systems. During the last few years, Linux* has been bolstering its presence on the server, due to improved kernel support for threads. For example, the recent release of the kernel-version 2.6-adds a new scheduler that should greatly optimize the speed with which threads can be swapped on a Linux system. The previous release of the kernel (v. 2.4-Linux kernels use even numbers for release versions and odd numbers for versions under development) was likewise distinguished by a substantial improvement in threading capability. These advances have helped place Linux on servers and into sites where it supports high performance computing (HPC). Along the way, Linux abandoned its original threading API (called Linux threads) and adopted Pthreads as its native threading interface, joining most of the UNIX variants available today.

However, Linux developers-just like programmers working on UNIX and Windows*-can avail themselves of a second threading API called OpenMP*, which was designed by a consortium of server vendors. This article compares and contrasts Pthreads and OpenMP and tries to identify which one can most benefit developers.

What is Pthreads?

Pthreads is a set of threading interfaces developed by the IEEE (Institute of Electrical and Electronics Engineers) committees in charge of specifying a Portable Operating System Interface (POSIX). The P in Pthreads stands for POSIX and, in fact, Pthreads are occasionally referred to as POSIX threads. Essentially, the POSIX committee defined a basic set of functions and data structures that it hoped would be adopted by numerous vendors so that threaded code could be ported easily across operating systems. The committee’s dream was realized by UNIX vendors who by and large all implemented Pthreads. (The notable exception is Sun, which continues to favor Solaris* threads as its primary threads API.) The portability of Pthreads has been expanded further by the Linux adoption and by a port to the Windows platform.

Pthreads specifies the API to handle most of the actions required by threads. These actions include creating and terminating threads, waiting for threads to complete, and managing the interaction between threads. In the latter category exist various locking mechanisms that prevent two threads from trying to modify the same data values simultaneously: mutexes, condition variables, and semaphores. (Technically speaking, semaphores are not part of Pthreads, but they are conceptually closely aligned with threading and available on all systems on which Pthreads run.)

To make use of Pthreads, developers must write their code specifically for this API. This means they must include header files, declare Pthreads data structures, and call Pthreads-specific functions. In essence, the process is no different than using other libraries. And like other libraries on UNIX and Linux, the Pthreads library is simply linked with application code (via the -lpthread parameter).

While the Pthreads library is fairly comprehensive (although not quite as extensive as some other native API sets) and distinctly portable, it suffers from a serious limitation common to all native threading APIs: it requires considerable threading-specific code. In other words, coding for Pthreads irrevocably casts the codebase into a threaded model. Moreover, certain decisions, such as the number of threads to use can become hard-coded into the program. In exchange for these constraints, Pthreads provides extensive control over threading operations-it is an inherently low-level API that mostly requires multiple steps to perform simple threading tasks. For example, using a threaded loop to step through a large data block requires that threading structures be declared, that the threads be created individually, that the loop bounds for each thread be computed and assigned to the thread, and ultimately that the thread termination be handled-all this must be coded by the developer. If the loop does more than simply iterate, the amount of thread-specific code can increase substantially. To be fair, the need for this much code is true of all native threading APIs, not just Pthreads.

Because of the amount of threading code needed to perform straightforward operations, developers have been increasingly looking for a simpler alternative to Pthreads.

What is OpenMP?

In 1997, a group of vendors came together under the aegis of hardware manufacturer, Silicon Graphics, to formulate a new threading interface. Their common problem was that the primary operating systems of the time all imposed drastically different ways of programming for threads. UNIX employed Pthreads, Sun used Solaris threads, Windows used its own API, and Linux used Linux threads (until its subsequent adoption of Pthreads). The committee wanted to design an API that would enable a codebase to run without changes equally well on Windows and UNIX/Linux. In 1998, it delivered the first API specification of what was called OpenMP (In those days, the term ‘open’ was associated with the concept of support from multiple vendors-as in open systems-rather than with today’s implication of open source.)

The OpenMP specification consists of APIs, a set of pragmas, and several settings for OpenMP-specific environment variables. As further revisions have been made to the standard, it has become clear that one of OpenMP’s most useful feature is its set of pragmas. By judicious use of these pragmas, a single-threaded program can be made multithreaded without recourse to APIs or environment variables. With the recent release of OpenMP 2.0, the OpenMP Architecture Review Board (ARB), which is the official name for the committee that formulates the OpenMP specification, made clear its preference that developers use the pragmas, rather than the APIs. Let’s examine this approach in greater depth, starting with a recap of what pragmas are.

The following definition of pragmas, taken from Microsoft’s documentation, is one of the clearest explanations: “The #pragma directives offer a way for each compiler to offer machine- and operating system-specific features while retaining overall compatibility with the C and C++ languages. Pragmas are machine- or operating system-specific by definition, and are usually different for every compiler. In C and C++, where they are most commonly used, pragmas have the form: #pragma token-string

A key aspect of pragmas is that if a compiler does not recognize a given pragma, it must ignore it (according to the ANSI C and C++ standards). Hence, it is safe to place library-specific pragmas in code without worrying that the code will be broken if it compiled with a different toolset.

Figure 1 shows a simple OpenMP pragma in action.

#pragma omp parallel for 

for ( i = 0; i < x; i++ ) 


printf ( "Loop number is %d%d%d


i, i, i );


Figure 1. Threading a loop with a simple OpenMP pragma

This pragma tells the compiler that the succeeding for-loop should be made multithreaded, and that the threads should execute in parallel. Between work done by the compiler and the OpenMP libraries, the for-loop will be executed using a number of threads, as explained shortly. OpenMP will take care of creating the threads, threading the for-loop by dividing the interactions among the threads, and handling the threads once the for-loop completes.

While OpenMP does not guarantee a priori how many threads will be created, it usually chooses a number equivalent to the number of available execution pipelines. On standard multiprocessor environments, this number is the number of processors. On systems with processors endowed with Hyper-Threading Technology, the number of pipelines is twice the number of processors. An API function or environment variable can be used to override the default number of threads.

OpenMP offers numerous other pragmas that identify code blocks to thread, scope variables to be shared across threads or local to individual threads, where to sync threads, how to schedule tasks or loop iterations to threads, and so forth. So, ultimately, it provides a medium-grained control over threading functionality. At this level of granularity, which is sufficient for many HPC applications, OpenMP delivers better than most other options on the promise of portability, optimal execution, and, especially, minimized disruption to the codebase.

Which Threading Model is Right For You?

OpenMP is convenient because it does not lock the software into a preset number of threads. This kind of lock-in poses a big problem for threaded applications that use lower-level APIs such as Pthreads or Win32. How can the software written with those APIs scale the number of threads when running on a platform where more processors are available? One approach has been to use threading pools, in which a bunch of threads are created at program start up and the work is distributed among them. However, this approach requires considerable thread-specific code and there is no guarantee that it will scale optimally with the number of available processors. Wi th OpenMP, the number need not be specified.

OpenMP’s pragmas have another key advantage: by disabling support for OpenMP, the code can be compiled as a single-threaded application. Compiling the code this way can be tremendously advantageous when debugging a program. Without this option, developers will frequently find it difficult to tell whether complex code is working incorrectly because of a threading problem or because of a design error unrelated to threading.

Should developers need finer-grained control, they can avail themselves of OpenMP’s threading API. It includes a small set of functions that fall into three areas: querying the execution environment’s threading resources and setting the current number of threads; setting, managing, and releasing locks to resolve resource access between threads; and a small timing interface. Use of this API is discouraged because it takes away the benefits provided by the pragma-only approach. At this level, the OpenMP API is a small subset of the functionality offered by Pthreads. Both APIs are portable, but Pthreads offers a much greater range of primitive functions that provide finer-grained control over threading operations. So, in applications in which threads have to be individually managed, Pthreads or the native threading API (such as Win32 on Windows) would be the more natural choice.

To run OpenMP, a developer must have a compiler that supports the standard. On Linux and Windows, Intel® Compilers for C/C++ and Fortran support OpenMP. On the UNIX platform, SGI, Sun, HP, and IBM all provide OpenMP-compliant compilers. Open-source OpenMP compilers can be found at

So, if you’re writing UNIX or Linux applications for HPC, look at both Pthreads and OpenMP. You might well find OpenMP to be an elegant solution.


Dave Butenhof, Programming with POSIX Threads (Addison-Wesley, 1997).

Rich Gerber and Andrew Binstock, Programming with Hyper-Threading Technology (Intel Press, 2004).

Bil Lewis, Multithreaded Programming Education*

Pthreads Win32,*

Getting Started with OpenMP*

Advanced OpenMP* Programming

Intel® Threading Tools and OpenMP*

About the Author

Andrew Binstock is the principal analyst at Pacific Data Works LLC. He was previously a senior technology analyst at PricewaterhouseCoopers, and earlier editor in chief of UNIX Review and C Gazette. He is the lead author of "P ractical Algorithms for Programmers," from Addison-Wesley Longman, which is currently in its 12th printing and in use at more than 30 computer-science departments in the United States.


For more complete information about compiler optimizations, see our Optimization Notice.


anonymous's picture

Wonderful comparation !

anonymous's picture

Hi sir,
I have one query that,if I have application where there are independent tasks(i.e. some functions which are independent) which I can execute in parallel using Pthread library and I also it contains some for loops which takes considerable time.So my question is can I use Pthread library for task level parallelization and OpenMP constructs for doing loop parallelization.(i.e. in short a Hybrid version of Pthread+OpenMP just like OpenMP+MPI).

anonymous's picture

How would you compare threading and OpenMP implementations in terms of generated code, both in size and in speed? I would expect OpenMP to be faster given that most of the thread management is taken care of by the compiler. What about speeds of locking/unlocking for eg.?
Also, since OpenMP forces code to be 'backward compatible' to a single thread, I think it is a greater excercise mentally to map the parallel regions of processing to code blocks. This is more explicit in a threading API...

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.