Debugging Threaded Applications
When Threads Get Crossed: Debugging Threaded Applications
By Andrew Binstock
Introduction
Parallel programming has a reputation for being difficult because of the complexity imposed by the threading dimension. While this viewpoint may be accurate in part, diligence in designing thread interactions and knowledge of what to look for when debugging will overcome many of the difficulties.
The Intel® Media Software Development Kit (Intel® Media SDK) gives media developers a standard programming interface for creating video solutions. Hardware optimized decode, encode, and video preprocessing offer the developer optimized routines for developing maximum video performance on a variety of platforms. Developers can leverage future enhancements to Intel® platforms by utilizing the Intel Media SDK today.
This document is designed to give an architectural overview of the new application programming interface (API). For further information on the topics discussed below, please refer to the Intel® Media SDK Reference Manual.
The Intel Media SDK comes with many simple application samples that illustrate its use with encoding, decoding and preprocessing video.
The Intel Media SDK is available for download here: www.intel.com/software/mediasdk.
The goal of parallel programming is to accelerate a program’s performance by running multiple threads of execution in parallel. This effort requires every thread to have an effect on at least one other thread (generally, at minimum, the thread that launched it). As long as these interactions are orderly, well designed, and predictable, improved performance ensues. However, if the threads interfere with one another or distort the algorithm incorrectly, defects specific to parallel programming will creep into the program.
These defects can be difficult to locate and equally hard to resolve once identified. As a result, in parallel programming it pays to program defensively. That is, know ahead of time where the traps are, and make sure you write your code to guard against them. Even then, an unexpected interaction between two threads can cause a bug to show up occasionally.
This article examines the primary places where threading bugs lurk and explains how to avoid them. It also looks at Intel® Thread Checker, which is one of the few threading tools available that can automate the discovery of potential trouble spots. This article assumes familiarity with threading basics and some form of mutual exclusion (a mutex, semaphore, or critical section). My comments apply equally to threads on all platforms—Windows*, Linux*, UNIX*, Java*, and so on.
Threading bugs occur most often in areas where two threads interact or two threads share a common data variable. The most common violations can be grouped into three categories: data races, deadlocks, and live locks, which is a rarely discussed topic. Scrupulously avoid them and your life as a parallel programmer will be much smoother.\
Data Races
Data races occur when two threads share a variable but don’t guard against simultaneous access and when at least one thread is modifying the variable. Suppose for example that you’ve made online reservations for a flight to your first gaming convention. Once you’ve paid for the reservation, you go online to choose your seat. However, at the same time, someone else on your flight is doing the same thing. There is one aisle seat left, and you click it to reserve it for yourself. The other person sees the same seat and clicks it at the same time you do. However, because of poor program design both of you receive confirmation that you now have the seat. So, who really gets the seat?
This situation is known as a data race—two threads unaware of each other that are both trying to update the same variable at the same time.
Data races should be suspected any time you receive inconsistent results when repeatedly running the same program against the same data. One solution to a data race is mutual exclusion (mutex). Using a mutex, for example, will cause one thread will put a lock on the code that updates the variable so that no other thread can update it until the lock is released. In the airline reservation system example, when a click occurs, a lock is placed on the seat-status update code. Anyone else who tries to select the same seat will have to wait until the lock releases, at which time the program will discover that the seat is now taken and that person will be told to select another seat.
Notice the inconsistency of the results: If the two clicks are milliseconds apart and no mutual exclusion is used, the second person will get the seat because that selection will overwrite his rival’s. In the second scenario, where mutual exclusion is used, the first click locks the other thread out and claims the seat.
Many data races are straightforward. You look where two threads share a data field or resource, and you impose sequential access via mutual exclusion. However, some forms of data races can be fairly subtle. Libraries, in particular, can create unexpected race conditions if they have not been thoroughly tested for thread safety. In addition, resources can sometimes create data races. Consider multiple threads running printf() at the same time. Without mutual exclusion, they can step on each others’ output easily and provide a screen or a log filled with nonsensical data.
So make sure your libraries are truly thread-safe and be vigilant about putting mutual exclusion around all data items and system resources that are shared between threads.
Deadlock
Deadlock is a situation in which two threads are blocked because each of them is waiting on a lock held by the other. For example, you’re designing a combat game in which the principal objective of the current stage is mopping up pockets of snipers. You come down a street and by the rules of the game you cannot advance because the street has not been cleared of snipers. But because of a logic error, the snipers cannot be cleared until a unit that’s stuck behind you clears the street first. You now have deadlock: each thread is waiting on the other. (Your advance is waiting on the sniper clearing unit, and that unit is waiting for you to move forward.) The result is that your thread is waiting for an event that will never occur. It appears to you that your thread or the game is hung.
The symptoms of deadlocks often involve two threads suddenly not advancing. If one of those threads is holding a lock that many other threads need, the program may appear to freeze completely.
There are several ways to avoid deadlocks. One way is to track every place where a thread uses mutual exclusion and determine whether a lock has any dependence at all on any other thread that could be waiting for it.
Even this diligent inventory can overlook a subtle deadlock possibility: acquiring locks in the wrong order. Suppose, for example, that your game flashes an announcement on all players’ screens whenever someone breaks the all-time record for disabling snipers. The way the logic is currently written, two locks are involved. One lock covers the code that checks your score against the old record and if you’ve beaten it, writes your score to the record book. Because of the possibility of a data race (as discussed previously), the code carefully uses mutual exclusion to record the score so it can’t be updated simultaneously by two different players. The next step is to acquire a lock to broadcast the new all-time high to all the players. The code uses a lock to make sure only one announcement can be made at a time. The code maintains the original lock on the score update, so that the point total in the announcement is not suddenly updated by someone else. So, at announcement time, the routine is holding two locks that enable the program to accurately record and broadcast the new all-time high.
What could possibly go wrong? As long as all threads acquire the locks in the same order, all is well. But a summer intern at your company decides that if a rookie-level player sets the all-time record, this should be broadcast immediately. So he acquires the broadcast lock first, does the broadcast, and only then acquires the lock to record the result. As we know, this will cause a data race if two players, one of whom is the rookie, break the record at the same time. It can also cause a deadlock. Suppose the rookie procedure grabs the announcement lock and is waiting on the recording lock, while the other player has the recording lock and is waiting on the broadcast lock. Now both players are waiting on each other in a deadly embrace. So, a crucially important rule is that if multiple locks are required for a transaction, they must always be acquired in the same order.
One helpful precaution against deadlock is to avoid holding a lock if a second lock cannot be acquired. Almost all thread libraries have some form of threading call—in Pthreads, for example, it’s called pthread_mutex_trylock()—that enables a developer to try a lock and to return an error code if it can’t be acquired rather than wait forever for the lock. This call allows the developer to release a previously held lock and retry the operation later.
Live Lock
Live lock is a situation involving multiple threads in which no thread can advance even though they are actively working. The quintessential example is the Dining Philosopher’s problem. In this classic story, five philosophers are in a room meditating on various ideas. In the room is a table with five plates of noodles and five chopsticks—one chopstick placed between each pair of plates. To eat, a philosopher must pick up two chopsticks, one on either side of his or her plate. The way the algorithm is written, the philosopher sits down and first attempts to pick up the left chopstick. If he’s successful, he attempts to pick up the right chopstick. If that’s successful too, he eats. If the right chopstick is unavailable, the philosopher puts down the left chopstick, waits five seconds, and tries again—until both chopsticks are available.
The live lock problem occurs if all five philosophers decide to eat at the same time. Then, each one picks up the left chopstick successfully, and then notices the right chopstick is not available. In synch, they all lay down the left chopstick, wait five seconds, and repeat the process. Again, they all find the left chopstick but discover the right chopstick is not available, so they put down their left chopstick and wait again. As long as they are perfectly in sync, they will repeat the pattern ad infinitum.
Live locks are nearly always the result of an algorithmic flaw rather than an implementation error. They are very difficult to identify because it can be nearly impossible to reproduce them. Consider, for example, that the problem cannot occur on a quad-core machine because with only four cores running in parallel, the five philosophers cannot act simultaneously. One will be swapped out briefly to enable the fifth philosopher’s action. As a result, one philosopher will always get to eat—and the live lock disappears. However, on processors with five or more cores the problem can indeed occur. So, if the QA engineer or the help desk technicians have quadcore machines, they will be unable to reproduce a bug reported by multiple users.
The best way to prevent live locks is through vigilance and defensive programming. Always consider what would happen if all possible threads performed the same action at the same time.
Locating Problem Code
As just explained in the previous example, the biggest challenge in debugging parallel code is reproducing and locating the problem. Frequently, though, if you can reproduce the problem, you can locate the error. Today most debuggers and interactive development environments enable you to break on specific threads, so you can see what each one is doing. Parallel debugging, however, becomes highly complex when dozens of threads are in flight. Not only is the debugging difficult, but the defensive programming work requires patient, careful thought and thorough analysis of design.
Intel® Thread Checker (http://software.intel.com/en-us/intel-thread-checker/) is one of the few tools available to help locate sources of threading conflicts and of parallel trouble spots. Intel Thread Checker is part of a line of thread-oriented development tools that also includes Intel® Thread Profiler and Intel® Threading Building Blocks. In its simplest use case, the Intel Thread Checker runs a threaded program and monitors the various threads looking for problematic interactions among them. It is capable of identifying race conditions and deadlocks, as well as suspicious threading practices.
Figure 1 shows a screenshot from the Intel Thread Checker after it discovers a deadlock in a small sample program.
Notice how it is able to trace the deadlock statements to specific lines of code in each of the two threads. This can be an invaluable time saver, especially on massively parallel code such as that in multi-player or other complex games.
Figure 2 shows the display of a data race found when running a different program. The central panel shows three places where data races were detected (marked with red circles). Below them are various points of information (in blue) that highlight events of possible interest. Clicking any of these events brings you to the line of source code in a panel that looks much like Figure 1. This design makes it easy to identify the unprotected variable and determine which thread is modifying it.
Intel® Thread Checker can certainly be used as a debugging tool, but this approach tends to underutilize it. A more plenary use case is to run it during integration tests. The reason for this is that parallel bugs are elusive creatures. A program that has a data race might run fine and give accurate results for weeks before one run suddenly delivers an incorrect result. To avoid this, the use of the Intel Thread Checker during the testing phase can automate the discovery of latent defects that are hidden because the right parallel conditions have been able to mask their presence. When a defect is found, Intel Thread Checker’s debugging facility brings added lift.
It is worth noting that the Intel® Thread Profiler also is a very helpful companion tool that has thread performance analysis capabilities, which frequently can pinpoint places where threads are interacting suboptimally.
In conclusion, to avoid having to debug threaded code, the best approach is to write threading applications defensively, test extensively, and rely on automated defect detection tools, such as Intel Thread Checker. This strategy will save you considerable frustration.
Andrew Binstock writes technical white papers at Pacific Data Works LLC. He is also a senior contributing editor for InfoWorld and a columnist for SD Times. He is the author or co-author of several books on programming, including two available from Intel Press. During his free time, he contributes to the open source typesetting and page layout project, Platypus (http://platypus.pz.org) You can reach Andrew through his blog at http://binstock.blogspot.com.































