Hot and Safe: a Beginner's Guide to Multithreaded Libraries

Submit New Article

Last Modified On :   October 27, 2009 7:08 PM PDT
Rate
 


by Steve Lewin-Berlin

Most of the discussion of multithreading that emerges from Cilk Arts is focused on creating multithreaded applications. I want to take a different tack today, and survey a variety of strategies for creating multithreaded libraries. Note that these challenges and strategies are not specific to Cilk++, but are important considerations for any parallel programming model.

 

First, let's distinguish between two very different goals: writing multithreaded libraries, and writing libraries that can be called from multithreaded programs.

Thread Hot Multithreaded Libraries

You may want to create a library that is multithreaded internally. Presumably, this is a library that does processor-intensive calculations that can be sped up through parallel programming. I'll call this a "thread-hot" library:

Thread Safe Multithreaded Libraries

Alternately, you may want a library that is safe to call from a multithreaded application. I'll call this a "thread-safe" library:

In this article, I'll enumerate strategies for building thread-safe libraries. Specifically, I will discuss the following five approaches:

  1. Use a functional programming style
  2. Keep all state in the calling thread
  3. Keep state in Thread Local Storage
  4. Allow (or require) shared state between callers
  5. Maintain completely separate instances of the library

Each of these choices involves some tradeoffs.

Use a functional programming style

The problems that arise when building a thread-safe library generally result from the use of state stored between calls. One choice is to simply eliminate any stored state, using a functional programming style.

Functional programming may be a good approach for brand new coding, but may not be practical when converting existing libraries. For libraries that must maintain state across calls, you must use one of the other approaches.

Keep all state in the calling thread

In this model, all state is maintained by the caller. The API must provide a way for the library to access the state, typically using a pointer passed with each call into the library.

Other than writing purely functional libraries, this is often the best way to write a thread-safe library. Each caller maintains state that is completely isolated from other callers. This model works well with libraries that use threading models in which the flow of control may move from one thread to another. This can happen explicitly if the application uses thread pools, or implicitly with packages such as Cilk++.

The primary disadvantage of this approach is that a new API needs to be defined for legacy libraries, requiring application changes.

Keep state in Thread Local Storage (TLS)

Modern operating systems offer thread-specific storage. State can be maintained in TLS, providing isolation between threads.

Using Thread Local Storage (TLS) to isolate state associated with different threads seems, on the surface, to be a good idea. In fact, if the state in TLS does not need to persist across calls into the library, TLS can be a good solution.

However, the use of TLS assumes that each thread in the calling application is completely independent. The model breaks down in parallel models that have more fine-grained parallelism, in which multiple threads in the application may work together. For example, multiple iterations in an OpenMP loop may run on different threads but expect to operate on a common object. In Cilk++, an application may be moved to a different thread after a cilk_spawn or cilk_sync due to work stealing. In these cases, TLS is not a safe place for persistent state.

Allow (or require) shared state between callers

If the library maintains global state, it will be shared between multiple callers.

In some cases, multiple threads should see the same state. For example, a library that manages a database will store persistent data that is visible to all threads. The library is responsible for synchronization and locking. Note that the calling threads are not isolated - one thread will see changes made by another.

If the library is not designed for multiple threads, then the application threads must coordinate access to the library. For example, the application threads might serialize access by creating a lock that must be held in order to call into the library. This may be safe, but the need for synchronized access reduces or even completely eliminates the advantages of parallelism. In fact, we have worked on applications in which this locking strategy led to overall slowdown compared to running on a single processor.

Maintain completely separate instances of the library

Though not directly supported by most operating systems, it is possible to load completely separate copies of a library into memory, with a unique instance for each calling thread.

This approach conflicts with the way the operating system expects shared libraries to be used, and thus requires some tricks to implement. The advantage of this approach is that an application can play this game with unsafe third-party libraries without access to the library source code.

Note that this approach has all of the disadvantages of thread-local storage and, because the code and data for the library must be loaded multiple times uses more memory overall. For applications that load many threads or very large libraries, this overhead may not be acceptable.

The Right Approach...

So which approach should you take? There is no single best choice. Some of the factors to consider are:

  • Are you writing a new library, or adapting an existing one?
  • Do you have source code for the library?
  • Is the interface fixed, or can you change the API?
  • Do you want or need any state shared across callers?
  • What threading model does the caller use?
  • What is the memory footprint of the library?

Your thoughts on thread-safe components?

We would love to hear from you!

  • Have you had to worry about creating a thread-safe library?
  • What technique(s) have you tried?
  • What has been your experience? Where are the "gotchas"?

Next Steps

Regardless of the solution you choose, it would certainly be nice to know if you have implemented it correctly and safely.

If you have chosen Cilk++ for your implementation, be sure to run Cilkscreen to find any races that your parallel constructs may have introduced. You can even use Cilkscreen (with a test harness) to test thread-safety of some non-Cilk++ programs, but I'll leave that topic for another day.

If you would like help in putting together a plan for multicore-enabling a performance-sensitive library or application, you might want to check out our QuickStart program.