Bubble, Bubble, Toil and Trouble; Mutex Lock and Buffer Double

A problem that I’ve been seeing recently within genomic codes stems from the I/O required. Before you can start processing them, large data files of DNA reads have to be read into memory and converted to a format that will be used by the computation, e.g., compressing the byte-sized characters (“A”, “C”, “T”, or “G”) to a 2-bit equivalent. The natural way to code up this processing is to input a read, call some conversion function, and store the result. Repeat until all files have been input.

While it is commonly accepted knowledge that I/O can be a bottleneck to performance, regardless of the relative speed between the input of a DNA read and the initial processing/storing of that input, the above serial execution is leaving some easy parallelism on the table. Obviously, the conversion processing of each read is likely independent of the conversion of some other read, but, more to my point, the input of a new read is independent of the conversion of the previous read. Overlapping computation and I/O by having a separate thread doing each of the two tasks is a good way to boost performance when it can be done.

However, to keep the results of the input and conversion correct I can’t be overwriting the previous unconverted read by a new read (like stabbing a perfectly usable King Duncan in his sleep). Nor do I want to re-process an already converted read (like raging at an empty chair as if Banquo’s ghost was sitting there). To this end, I can declare and use a second input buffer. While the I/O thread is filling up one buffer with the next read, the conversion thread can be working on processing the previous read. In this post I want to sketch out one way you can coordinate the two threads without having them overwrite good data or pull out stale data from the two shared buffers.

You may have realized that the cooperation between the input and conversion threads is an example of the Producer/Consumer pattern. The (pseudo-) code below is an implementation of Producer/Consumer with a shared double buffer structure used to pass items from the Producer thread to the consumer thread. I’ve implemented it with Pthreads to use a condition variable for control of access to the two buffers. First, declarations for the buffers, some index pointers, and the synchronization objects, all of which are visible to both producer and consumer threads.

	#define BUFSIZE 3

	dnaReadType buffer[BUFSIZE];

	int in=1;   // index to store next element

	int out=0;  // index of last removed element

	pthread_mutex_t b_lock=PTHREAD_MUTEX_INITIALIZER;

	pthread_cond_t b_cond=PTHREAD_COND_INITIALIZER;


I’ve allocated an array of three items to hold whatever data or structure, called dnaReadType, I’m inputting from a file. This could just as easily be an array of pointers to allocated memory if the input isn’t a known or constant size. (This latter idea is more in keeping with multiple data buffers. The single array buffer is then simply syntactic sugar to quickly reference one or the other and to easily switch from one data buffer to another.) Either way, the consumer (conversion) thread will be accessing the data from one element while the producer (input) thread fills up another. The in index will refer to the element that is next to receive new data from the producer and the out index will reference the element that last held a read for conversion by the consumer. The mutex, b_lock, and condition variable, b_cond, will be used to control access to the buffer elements and to protect updates to in and out.

The scheme and code I describe below requires that the number of items in buffer (given as BUFSIZE) be one more than the number of data buffers I want to use. This makes the test for a full or empty set of data buffers easier to code, as I’ll explain below.

If the data buffers are full (i.e., two buffer elements contain valid data awaiting conversion) the producer must wait until a slot opens up. Otherwise the new read can simply be put into the open position of buffer. The relevant code for the Producer is given here:

	while (more to produce){

	   newRead = getNewRead(inputFile); // produce something


	   while (out == in)                // buffer is full

	      pthread_cond_wait(&b_cond, &b_lock);                                          

	   buffer[in++]= newRead;           // store in buffer

	   in %= BUFSIZE;





After inputting some new read from the input file, the mutex b_lock controls access to buffer and protects the reading and writing of the in and out index variables. The conditional expression checks the status of buffer. The buffers are full if out and in reference the same item in the array. This item is the extra element (which is not considered part of the usable buffer) that I included by making BUFSIZE equal to 3 for a set of two data buffers. While this array slot can (and will) hold data at some point, it is not considered a valid entry whenever two data buffers both contain unprocessed reads. Whenever buffer is full, the producer thread goes to “sleep” on the condition variable, b_cond.

At such time as the consumer signals that at least one of the data buffers is available, the producer thread wakes up, stores the new read (or performs the input of new data), increments the in index with a modulus on the BUFSIZE, signals the consumer thread (in case it is waiting on an empty buffer), and releases the mutex to allow the consumer to access buffer.

The consumer thread code would be implemented like this:

	while (more to consume) {                                                           


	    while (in == ((out+1) % BUFSIZE))  // buffer is empty



	    out %= BUFSIZE;

	    newRead = buffer[out];            // save data for processing



	  ProcessRead(newRead);               // consume the data



As with the producer code, the mutex, b_lock, controls access to buffer and the update of the out and in index variables. For the consumer, if the data buffers are empty, processing of a new read must be blocked. The conditional expression tests for the empty buffer. When the out index is found to be one position “behind” the in position (modulo the BUFSIZE), then buffer has no valid read stored in the next out position.

Once the consumer thread knows that there is at least one element to be converted by either being signaled to wake up or by finding a non-empty buffer (and skipping the call to pthread_cond_wait), the out index is incremented and the read stored in buffer[out] is pulled out for conversion. Before the new read is processed, the consumer must signal the producer (which may be waiting on full data buffers) and then release the mutex.

With any parallel algorithm that requires synchronized coordination between two or more threads, you need to consider whether or not the code actually will work in all cases. Better yet, prove that it works in all cases. For my Producer/Consumer code with a single instance of each, the tricky part is whether or not the tests for full and empty buffer will pan out as expected. As a thought experiment, if you’re unsure about the scheme I’ve just discussed, see if you can convince yourself that the following cases will work as expected: 1) the consumer thread finding the initially empty buffer, 2) the producer finding the initial empty buffer, 3) due to a slow consumer thread, the producer fills up buffer and then has one more item to be stored, and 4) due to an initially quick, but now slow producer thread, the consumer initially encounters a full buffer and proceeds to process all data stored (and wants more) before a new read can be input.

One last thing to note is that the size of the buffer can be much larger than 3 elements. You can also use more consumer threads if the conversion processing time is much slower than the time to input a new read from the input file. Similarly, more than one producer can be added to the above code. The scenarios needed to prove that everything still works as it should become a bit more complex, but nothing that some simple logic and hand-waving can’t accomplish. I mean, it’s not like trying to get the Great Birnam Wood to come to Dunsinane to add to Macbeth’s long list of problems.

For more complete information about compiler optimizations, see our Optimization Notice.


Clay B.'s picture

Raf -

I'm not sure the Boost libraries were available in 16th Century Scotland. :-)

Looking over the library description, this might be a viable option in the application that I pulled my inspiration from. I'll have to look into that more in-depth and make a recommendation back to the original code authors. They already use the Boost library to do other processing.

Raf Schietekat's picture

Wouldn't something like Boost Asio provide some relief?

Clay B.'s picture

Dmitry -

Fabulous! "Old" is true, especially in the Life Sciences arena.

I guess I'll have to put Go on my list of new languages to learn (right after I can get more experience with Python).

Dmitry Vyukov's picture

Hey, New Clay! You are still programming in an Old Language! Check out how simple it is in Go. You can fit the whole program into main function, it's trivial to change buffer sizes, send buffers from workers back to producer or change number of workers:

func main() {
    in := make(chan string, 1000) // input data
    // start concurrent reading goroutine, feeds in with lines of text
    go func() {
        f, _ := os.Open(os.Args[1])
        for s := bufio.NewScanner(f); s.Scan(); {
            in <- s.Text()
    out := make(chan string, 1000) // output data
    const P = 10                   // number of concurrent workers
    var wg sync.WaitGroup
    // start concurrent workers, read from in, feed out
    for i := 0; i < P; i++ {
        go func() {
            for line := range in {
                out <- strings.ToUpper(line)
    // wait for all workers
    go func() {
    // read output and print
    for line := range out {


Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.