Go Parallel

By Dmitriy Vyukov,

Published:06/18/2013   Last Updated:06/18/2013

This is a first post in a series of posts about parallel programming with Go language. What is Go? You may ask. Go is a language with the cutest mascot ever:

As you may see, it also supports parallel programming:

as well as concurrent programming:

I am sure you are already excited by the language. But wait, there is more to it!

I am not going to give you a tutorial about the language. The language is simple enough, so if you know a pair of other imperative languages, it should take you half an hour to write your first program. You can try Go right in your browser, and there is also an online tour of Go.

There are several features that make Go especially good for parallel programming:

  • Parallel programming with Go is simple. As simple as with Cilk or OpenMP, way simpler than with pthreads.
  • Go primitives are simple yet complete. This means that you don't have limitations of strictly nested parallelism as with Cilk.
  • Go is well suited for IO and concurrent processing. This means that you can easily integrate the parallel part with other parts of the system without resorting to using several languages or libraries.
  • And in the end, Go is just a good modern language with a rich standard library.

Go concurrency model is based on goroutines and channels. Goroutines are threads, except that you can not join them. Channels are typed FIFO queues for goroutine communication and synchronization.

You start goroutines by prefixing a function call with ‘go’ keyword:

go myfunc(1, 2)

or using an anonymous function:

go func(x, y int) { ... }(1, 2)

You create channels with make function:

c := make(chan int)

send to channels using an arrow pointing into the channel:

c <- 42

and receive from channels using an arrow pointing from the channel:

x = <-c

Let’s put it all together to calculate sum and difference of two numbers in parallel:

func main() {
	// Generate two random numbers.
	x := rand.Int() 
	y := rand.Int()
	// Create channels for sum and difference.
	sum := make(chan int)
	dif := make(chan int)
	// Start a goroutine to calculate the sum.
	go func() {
		sum <- x + y // Send the result to channel.
	// Start a goroutine to calculate the difference.
	go func() {
		dif <- x - y // Send the result to channel.
	// Receive the results from the channels and print.
	fmt.Printf("sum=%v dif=%vn", <-sum, <-dif)

You can play with the program here.

Calculating sum and difference of two integers in parallel is not very useful. Let's consider a more realistic program, it calculates Pi using parallel  Monte Carlo method:

func main() {
	nThrow := flag.Int("n", 1e6, "number of throws")
	nCPU := flag.Int("cpu", 1, "number of CPUs to use")
	runtime.GOMAXPROCS(*nCPU) // Set number of OS threads to use.
	parts := make(chan int)   // Channel to collect partial results.
	// Kick off parallel tasks.
	for i := 0; i < *nCPU; i++ {
		go func(me int) {
			// Create local PRNG to avoid contention.
			r := rand.New(rand.NewSource(int64(me)))
			n := *nThrow / *nCPU
			hits := 0
			// Do the throws.
			for i := 0; i < n; i++ {
				x := r.Float64()
				y := r.Float64()
				if x*x+y*y < 1 {
			parts <- hits // Send the result back.
	// Aggregate partial results.
	hits := 0
	for i := 0; i < *nCPU; i++ {
		hits += <-parts
	pi := 4 * float64(hits) / float64(*nThrow)
	fmt.Printf("PI = %gn", pi)

See the program in action here.

Note how the Go program avoids inversion of control and callback hell. It all fits into the main function and you can read it from top to bottom.

In the next blog we will consider how to implement parallel divide-and-conquer and pipeline pattern with Go. Stay tuned!

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804