# Go Parallel 2

Published:04/13/2014   Last Updated:04/13/2014

This is the second part of Go Parallel series (part 1) about parallel programming with Go language. As I promised we will look at parallel divide-and-conquer decomposition and parallel pipelines.

Guess what problem we will parallelize. Right! Fibonacci number computation. There are more efficient ways to calculate Fibonacci numbers, but we will use the simple recursive formula. Because, well, it’s simple.

Let’s first look at the sequential version:

``````

func fib(n uint64) uint64 {
if n <= 2 {
return 1
}
return fib(n-1) + fib(n-2)
}

``````

You can play with the program here.

I hope it’s pretty self-explanatory. However, there is an interesting thing about this program. It uses data-driven recursion (recursion depth depends on user input), and usually it’s a bad idea because it can cause stack overflow. But Go uses dynamically sized stacks, that is, stack grows and shrinks are necessary. This allows to express such inherently recursive algorithms in the natural form without fear of stack overflows.

Now let’s look at the parallel version:

``````

func fib(n uint64) uint64 {
if n <= 2 { // terminate recursion
return 1
}
// create a channel for subtask results
res := make(chan uint64, 2)
// fork 2 subtasks, they will send results to the res channel
go func() {
res <- fib(n - 1)
}()
go func() {
res <- fib(n - 2)
}()
return <-res + <-res
}

``````

See it in action here.

Note how we use the channel to join subtasks and collect results at the same time. “<-res + <-res” may look a bit unusual at first. Channel receive operation is an expression, so it can be used inside of more complex expressions.

Let’s move on to pipelines. Let’s say we want to implement a “classical” pipeline with serial input stage, parallel transformation stage and serial output stage. For example, we want to read image files from disk, resize them and write the resulting images back to disk. A usual structure for such processing looks as follows:

``````

func main() {
in := make(chan T1, C1)  // channel for input data
go func() {
// Goroutine that generates input data.
for ... {
in <- generate()
}
// Denote end of input stream by closing the channel.
// This allows to use the for-range construct below.
close(in)
}()
out := make(chan T2, C2)  // channel for output data
// Done is an auxiliary channel that we use to track
// worker goroutine completion.
done := make(chan bool)
// Spawn N worker goroutines that do parallel data transformation.
for p := 0; p < P; p++ {
go func() {
// Read elements from the input stream, transform each element
// and send the result to the output channel.
for v := range in {
out <- transform(v)
}
// Denote completion of this goroutine.
done <- true
}()
}
go func() {
// receive P values from done channel
for p := 0; p < P; p++ {
<-done
}
// At this point we know that all worker goroutines has completed
// and no more data is coming, so close the output channel.
close(out)
}()
// Range over the output channel and consume the results.
for v := range out {
consume(v)
}
}

``````

Note that for efficient pipelining channels must be sufficiently buffered to avoid unnecessary idling of goroutines. Once again, Go allows to express this complex interaction of concurrent activities as the straight-line code that you can read from top to bottom.

Here you can see how this pattern is applied to simplified drug design problem: we match candidate drug molecules (ligands) to a target protein to find good candidates for use as drugs.

In the next part we will look at sync and sync/atomic packages and how they can help with common parallel programming tasks. Stay tuned!

#### Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804