| Last Modified On : | October 27, 2009 7:08 PM PDT |
Rate |
|
First, let's distinguish between two very different goals: writing multithreaded libraries, and writing libraries that can be called from multithreaded programs.
You may want to create a library that is multithreaded internally. Presumably, this is a library that does processor-intensive calculations that can be sped up through parallel programming. I'll call this a "thread-hot" library:
Alternately, you may want a library that is safe to call from a multithreaded application. I'll call this a "thread-safe" library:
In this article, I'll enumerate strategies for building thread-safe libraries. Specifically, I will discuss the following five approaches:
Each of these choices involves some tradeoffs.
The problems that arise when building a thread-safe library generally result from the use of state stored between calls. One choice is to simply eliminate any stored state, using a functional programming style.
Functional programming may be a good approach for brand new coding, but may not be practical when converting existing libraries. For libraries that must maintain state across calls, you must use one of the other approaches.
In this model, all state is maintained by the caller. The API must provide a way for the library to access the state, typically using a pointer passed with each call into the library.
Other than writing purely functional libraries, this is often the best way to write a thread-safe library. Each caller maintains state that is completely isolated from other callers. This model works well with libraries that use threading models in which the flow of control may move from one thread to another. This can happen explicitly if the application uses thread pools, or implicitly with packages such as Cilk++.
The primary disadvantage of this approach is that a new API needs to be defined for legacy libraries, requiring application changes.
Modern operating systems offer thread-specific storage. State can be maintained in TLS, providing isolation between threads.
Using Thread Local Storage (TLS) to isolate state associated with different threads seems, on the surface, to be a good idea. In fact, if the state in TLS does not need to persist across calls into the library, TLS can be a good solution.
However, the use of TLS assumes that each thread in the calling application is completely independent. The model breaks down in parallel models that have more fine-grained parallelism, in which multiple threads in the application may work together. For example, multiple iterations in an OpenMP loop may run on different threads but expect to operate on a common object. In Cilk++, an application may be moved to a different thread after a cilk_spawn or cilk_sync due to work stealing. In these cases, TLS is not a safe place for persistent state.
If the library maintains global state, it will be shared between multiple callers.
In some cases, multiple threads should see the same state. For example, a library that manages a database will store persistent data that is visible to all threads. The library is responsible for synchronization and locking. Note that the calling threads are not isolated - one thread will see changes made by another.
If the library is not designed for multiple threads, then the application threads must coordinate access to the library. For example, the application threads might serialize access by creating a lock that must be held in order to call into the library. This may be safe, but the need for synchronized access reduces or even completely eliminates the advantages of parallelism. In fact, we have worked on applications in which this locking strategy led to overall slowdown compared to running on a single processor.
Though not directly supported by most operating systems, it is possible to load completely separate copies of a library into memory, with a unique instance for each calling thread.
This approach conflicts with the way the operating system expects shared libraries to be used, and thus requires some tricks to implement. The advantage of this approach is that an application can play this game with unsafe third-party libraries without access to the library source code.
Note that this approach has all of the disadvantages of thread-local storage and, because the code and data for the library must be loaded multiple times uses more memory overall. For applications that load many threads or very large libraries, this overhead may not be acceptable.
So which approach should you take? There is no single best choice. Some of the factors to consider are:
We would love to hear from you!
Regardless of the solution you choose, it would certainly be nice to know if you have implemented it correctly and safely.
If you have chosen Cilk++ for your implementation, be sure to run Cilkscreen to find any races that your parallel constructs may have introduced. You can even use Cilkscreen (with a test harness) to test thread-safety of some non-Cilk++ programs, but I'll leave that topic for another day.
If you would like help in putting together a plan for multicore-enabling a performance-sensitive library or application, you might want to check out our QuickStart program.
