Go Parallel 2

This is the second part of Go Parallel series (part 1) about parallel programming with Go language. As I promised we will look at parallel divide-and-conquer decomposition and parallel pipelines.

Guess what problem we will parallelize. Right! Fibonacci number computation. There are more efficient ways to calculate Fibonacci numbers, but we will use the simple recursive formula. Because, well, it’s simple.

Let’s first look at the sequential version:


func fib(n uint64) uint64 {
	if n <= 2 {
		return 1
	}
	return fib(n-1) + fib(n-2)
}


You can play with the program here.

I hope it’s pretty self-explanatory. However, there is an interesting thing about this program. It uses data-driven recursion (recursion depth depends on user input), and usually it’s a bad idea because it can cause stack overflow. But Go uses dynamically sized stacks, that is, stack grows and shrinks are necessary. This allows to express such inherently recursive algorithms in the natural form without fear of stack overflows.

Now let’s look at the parallel version:


func fib(n uint64) uint64 {
	if n <= 2 { // terminate recursion
		return 1
	}
	// create a channel for subtask results
	res := make(chan uint64, 2)
	// fork 2 subtasks, they will send results to the res channel
	go func() {
		res <- fib(n - 1)
	}()
	go func() {
		res <- fib(n - 2)
	}()
	return <-res + <-res
}


See it in action here.

Note how we use the channel to join subtasks and collect results at the same time. “<-res + <-res” may look a bit unusual at first. Channel receive operation is an expression, so it can be used inside of more complex expressions.

Let’s move on to pipelines. Let’s say we want to implement a “classical” pipeline with serial input stage, parallel transformation stage and serial output stage. For example, we want to read image files from disk, resize them and write the resulting images back to disk. A usual structure for such processing looks as follows:


func main() {
	in := make(chan T1, C1)  // channel for input data
	go func() {
		// Goroutine that generates input data.
		for ... {
			in <- generate()
		}
		// Denote end of input stream by closing the channel.
		// This allows to use the for-range construct below.
		close(in)
	}()
	out := make(chan T2, C2)  // channel for output data
	// Done is an auxiliary channel that we use to track
	// worker goroutine completion.
	done := make(chan bool)
	// Spawn N worker goroutines that do parallel data transformation.
	for p := 0; p < P; p++ {
		go func() {
			// Read elements from the input stream, transform each element
			// and send the result to the output channel.
			for v := range in {
				out <- transform(v)
			}
			// Denote completion of this goroutine.
			done <- true
		}()
	}
	go func() {
		// receive P values from done channel
		for p := 0; p < P; p++ {
			<-done
		}
		// At this point we know that all worker goroutines has completed
		// and no more data is coming, so close the output channel.
		close(out)
	}()
	// Range over the output channel and consume the results.
	for v := range out {
		consume(v)
	}
}


Note that for efficient pipelining channels must be sufficiently buffered to avoid unnecessary idling of goroutines. Once again, Go allows to express this complex interaction of concurrent activities as the straight-line code that you can read from top to bottom.

Here you can see how this pattern is applied to simplified drug design problem: we match candidate drug molecules (ligands) to a target protein to find good candidates for use as drugs.

In the next part we will look at sync and sync/atomic packages and how they can help with common parallel programming tasks. Stay tuned!

有关编译器优化的更完整信息,请参阅优化通知