Go Parallel

This is a first post in a series of posts about parallel programming with Go language. What is Go? You may ask. Go is a language with the cutest mascot ever:

As you may see, it also supports parallel programming:

as well as concurrent programming:

I am sure you are already excited by the language. But wait, there is more to it!

I am not going to give you a tutorial about the language. The language is simple enough, so if you know a pair of other imperative languages, it should take you half an hour to write your first program. You can try Go right in your browser, and there is also an online tour of Go.

There are several features that make Go especially good for parallel programming:

  • Parallel programming with Go is simple. As simple as with Cilk or OpenMP, way simpler than with pthreads.
  • Go primitives are simple yet complete. This means that you don't have limitations of strictly nested parallelism as with Cilk.
  • Go is well suited for IO and concurrent processing. This means that you can easily integrate the parallel part with other parts of the system without resorting to using several languages or libraries.
  • And in the end, Go is just a good modern language with a rich standard library.

Go concurrency model is based on goroutines and channels. Goroutines are threads, except that you can not join them. Channels are typed FIFO queues for goroutine communication and synchronization.

You start goroutines by prefixing a function call with ‘go’ keyword:

go myfunc(1, 2)

or using an anonymous function:

go func(x, y int) {


}(1, 2)

You create channels with make function:

c := make(chan int)

send to channels using an arrow pointing into the channel:

c <- 42

and receive from channels using an arrow pointing from the channel:

x = <-c

Let’s put it all together to calculate sum and difference of two numbers in parallel:

func main() {
	// Generate two random numbers.
	x := rand.Int() 
	y := rand.Int()
	// Create channels for sum and difference.
	sum := make(chan int)
	dif := make(chan int)
	// Start a goroutine to calculate the sum.
	go func() {
		sum <- x + y // Send the result to channel.
	// Start a goroutine to calculate the difference.
	go func() {
		dif <- x - y // Send the result to channel.
	// Receive the results from the channels and print.
	fmt.Printf("sum=%v dif=%vn", <-sum, <-dif)

You can play with the program here.

Calculating sum and difference of two integers in parallel is not very useful. Let's consider a more realistic program, it calculates Pi using parallel  Monte Carlo method:

func main() {
	nThrow := flag.Int("n", 1e6, "number of throws")
	nCPU := flag.Int("cpu", 1, "number of CPUs to use")
	runtime.GOMAXPROCS(*nCPU) // Set number of OS threads to use.
	parts := make(chan int)   // Channel to collect partial results.
	// Kick off parallel tasks.
	for i := 0; i < *nCPU; i++ {
		go func(me int) {
			// Create local PRNG to avoid contention.
			r := rand.New(rand.NewSource(int64(me)))
			n := *nThrow / *nCPU
			hits := 0
			// Do the throws.
			for i := 0; i < n; i++ {
				x := r.Float64()
				y := r.Float64()
				if x*x+y*y < 1 {
			parts <- hits // Send the result back.
	// Aggregate partial results.
	hits := 0
	for i := 0; i < *nCPU; i++ {
		hits += <-parts
	pi := 4 * float64(hits) / float64(*nThrow)
	fmt.Printf("PI = %gn", pi)

See the program in action here.

Note how the Go program avoids inversion of control and callback hell. It all fits into the main function and you can read it from top to bottom.

In the next blog we will consider how to implement parallel divide-and-conquer and pipeline pattern with Go. Stay tuned!

For more complete information about compiler optimizations, see our Optimization Notice.


Rolf Lampa's picture

Here's a detailed example (described in more detail in the comments) of where the race detector fails to detect risky pointers in buffers:

Rolf Lampa's picture

@Dmitry Vyukov wrote:
"Just run tests with $ go test -race. And that's it."

The race detector is a very good tool. But it has its glitches too. If you for instance send pointers instead of by value on a buffered channel, the tool won't tell you that the buffer may be reshuffled with the same pointers over and over again... only the result will reveal to you that it's not consistent between runs.
So, even if your (unbuffered) program passes all tests, including race detection and correct and consistent results (indicating no races) that doesn't mean that you are "safe" if you introduce buffers in the same (tested) program. That is, look out for passing pointers over buffered channels.
I have suggested that (someone) would enhance the race detector to warn against this combination (it's not always "wrong" but very very risky).

And yes, it's not agiven that the n > CPU goroutines gives better results that n = CPU. Important point.

Dmitry Vyukov's picture

>nCPU := flag.Int("cpu", runtime.NumCPU() * 2, "number of CPUs to use")

Who have suggested this? This looks wrong.

Dmitry Vyukov's picture

OOOUCH, I forgot to mention that another reason to use Go for parallel programming is builtin data race detector:
Just run tests with $ go test -race. And that's it.
Among other nice things are builtin CPU, memory and contention profilers:

Dmitry Vyukov's picture

Yes, the playground is completely single-threaded. And the reason is... data races. With data races one can make a breach in type and memory safety.
But all the concurrency is still there, so you can start goroutines and communicate via channels.
But to experience actually parallel execution you need to install Go locally.

Rolf Lampa's picture

Ah yes, Dmitry, perhaps even more than that? So yes, on row two in the main func one could do as you suggest, namely increase (* 2) the number of coroutines:

nCPU := flag.Int("cpu", runtime.NumCPU() * 2, "number of CPUs to use")


(Edit: Fixed spelling of name, sorry. )

Dmitry Oganezov (Intel)'s picture

I guess I know these obvious reasons but... Just an idea, since we're talking about parallel paradigms, perhaps there are some reasons for two virtual CPUs? ;)

Rolf Lampa's picture

@Dmitry Oganezov (Intel): The reason playground returns one (1) CPU is that it is "sandboxed", so it exposes only one CPU and limited memory (for obvious reasons). :)
If you have Go installed locally you also have a playground with no limits. And yes, Go is the way to Go!

Dmitry Oganezov (Intel)'s picture

Hi, Dmitry! Welcome back to IDZ blog! Your previous posts got a remarkable traffic.

I have a question about Go Playground; I've tried to modify your Pi sample to use more CPUs :). AFAIK here is the right way to do that: runtime.NumCPU(). 

The problem is playground returns ... one. What am I doing wrong?


Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.