Go Parallel

This is a first post in a series of posts about parallel programming with Go language. What is Go? You may ask. Go is a language with the cutest mascot ever:

As you may see, it also supports parallel programming:

as well as concurrent programming:

I am sure you are already excited by the language. But wait, there is more to it!

I am not going to give you a tutorial about the language. The language is simple enough, so if you know a pair of other imperative languages, it should take you half an hour to write your first program. You can try Go right in your browser, and there is also an online tour of Go.

There are several features that make Go especially good for parallel programming:

  • Parallel programming with Go is simple. As simple as with Cilk or OpenMP, way simpler than with pthreads.
  • Go primitives are simple yet complete. This means that you don't have limitations of strictly nested parallelism as with Cilk.
  • Go is well suited for IO and concurrent processing. This means that you can easily integrate the parallel part with other parts of the system without resorting to using several languages or libraries.
  • And in the end, Go is just a good modern language with a rich standard library.

Go concurrency model is based on goroutines and channels. Goroutines are threads, except that you can not join them. Channels are typed FIFO queues for goroutine communication and synchronization.

You start goroutines by prefixing a function call with ‘go’ keyword:


go myfunc(1, 2)

or using an anonymous function:


go func(x, y int) {

 ...

}(1, 2)

You create channels with make function:


c := make(chan int)

send to channels using an arrow pointing into the channel:


c <- 42

and receive from channels using an arrow pointing from the channel:


x = <-c

Let’s put it all together to calculate sum and difference of two numbers in parallel:


func main() {
	// Generate two random numbers.
	x := rand.Int() 
	y := rand.Int()
	// Create channels for sum and difference.
	sum := make(chan int)
	dif := make(chan int)
	// Start a goroutine to calculate the sum.
	go func() {
		sum <- x + y // Send the result to channel.
	}()
	// Start a goroutine to calculate the difference.
	go func() {
		dif <- x - y // Send the result to channel.
	}()
	// Receive the results from the channels and print.
	fmt.Printf("sum=%v dif=%vn", <-sum, <-dif)
}

You can play with the program here.

Calculating sum and difference of two integers in parallel is not very useful. Let's consider a more realistic program, it calculates Pi using parallel  Monte Carlo method:


func main() {
	nThrow := flag.Int("n", 1e6, "number of throws")
	nCPU := flag.Int("cpu", 1, "number of CPUs to use")
	flag.Parse()
	runtime.GOMAXPROCS(*nCPU) // Set number of OS threads to use.
	parts := make(chan int)   // Channel to collect partial results.
	// Kick off parallel tasks.
	for i := 0; i < *nCPU; i++ {
		go func(me int) {
			// Create local PRNG to avoid contention.
			r := rand.New(rand.NewSource(int64(me)))
			n := *nThrow / *nCPU
			hits := 0
			// Do the throws.
			for i := 0; i < n; i++ {
				x := r.Float64()
				y := r.Float64()
				if x*x+y*y < 1 {
					hits++
				}
			}
			parts <- hits // Send the result back.
		}(i)
	}
	// Aggregate partial results.
	hits := 0
	for i := 0; i < *nCPU; i++ {
		hits += <-parts
	}
	pi := 4 * float64(hits) / float64(*nThrow)
	fmt.Printf("PI = %gn", pi)
}

See the program in action here.

Note how the Go program avoids inversion of control and callback hell. It all fits into the main function and you can read it from top to bottom.

In the next blog we will consider how to implement parallel divide-and-conquer and pipeline pattern with Go. Stay tuned!

For more complete information about compiler optimizations, see our Optimization Notice.

20 comments

Top
Rolf Lampa's picture

@Dmitry Vyukov ; :)
So nice and clean! Thank you for this excellent example, I learn from it.
.
Notice though that the original example was meant to test, not how to count per se, but the concept of the overall control structure using a <-quit signal. The DNA thing was there only to have something for the control structure to chew on- (that is, it was meant to test the signal approach).
.
Well, the signal approach really isn't very useful in a simple example like this. Moreover, your take on it is also much more elegant than using a WaitGroup.
.
At last, your example code may well perform good enough to compete with some other algoritms dealing with the DNA string (if doing the counting using a char table instead). I'll try that (see other algorithms here: http://saml.rilspace.org/moar-languagez-gc-content-in-python-d-fpc-c-and-c )
.
In any case, very elegant! :)

Dmitry Vyukov's picture

Here is roughly how I would do it:
http://play.golang.org/p/LKhgKxsvnM
No need to send pointers via channels. No need to collect partial results.

Rolf Lampa's picture

OK, thank you @Dmitry Vyukov for pointing (... pun intended) this out! I was also perplext when the guy at stackoverflow told me that ptr&&buff was the problem (in case 3: http://play.golang.org/p/RMg0NfRadj ).

But then it only means that the problem resides in the main routine's select clause, in that it detects the first goroutine which is done (which sends a <-quit ) and then breaks prematurely. Is that correct?

If so, then a WaitGroup (from the sync library) could fix the problem (since the goroutines can't keep track of each other and thus can't sync knowing when all the other's are done). In short, the "quit channel" seems to be the problem. I'll try to rearrange the code once more and try again.

Thanks you for your analyze, this case really puzzled me, and this hopefully solves the riddle.

Dmitry Vyukov's picture

> Conclusion: On my 4 kernel laptop neither of the three does detect any races,
> although the /results vary/ for 3 from run to run. :(
> The cause of the problems (at least in this case) therfore seems to be the
> /combination of pointers passed on buffered channels/"

You just don't read all the messages from resChA. It's not related to buffered channels. It's not related to passing pointers via channels.
It's not basic safety issue, so it can not be detected by any tool. You may want to have such program (deterministic results are not required, but want to terminate after quit signal as soon as possible).

Rolf Lampa's picture

(removed after reformatting the prev. post)

Rolf Lampa's picture

Hi again @Dmitry Vyukov

Unfortunately I had not provided a link to the updated code versions where all the (detected) races had been fixed. Below I provide links to three versions (use diff) with the code cleansed from all (detected) races (for example the DNA string had to be copied, not referenced since the goroutines would read it while the next string was being feeded from the scanner). Only the first two versions give consistent results though.

.
I forgot to post the updated code which fixed the datarace stemming from sending the (DNA) string as a reference (now a string copy). The last two versions have pointers to the integers (also to the string, but since it's now a copy the pointer isn't a problem):
The difference is stepwise; 1->2=pointers, 2->3=buffers. Ver 3:

//  resChA := make(chan *int, 10)   // <- buf
//  resChB := make(chan *int, 10)   // <- buf
//  inStr  := make(chan *[]byte, 10) // <- buf

Links:

    Play 1: (Unbuffered, no pointers)
    Play 2: (Unbuffered, *with* pointers)
    Play 3: (*Buffered*, with pointers)

Conclusion: On my 4 kernel laptop neither of the three does detect any races, although the results vary for 3 from run to run. :(
The cause of the problems (at least in this case) therfore seems to be the combination of pointers passed on buffered channels"

.

// "First of all, sending pointers over channels is not a bug. And race detector must not warn about that.
.
True, it's not a bug, and as shown in the first example (unbuffered) the code runs perfectly fine, with consistent results. That's also why I didn't suggest that the detector should report this as a "bug", only (optionally?) warn for potential problems (if going buffered).

.

// False positives kill any bug detection tool.
.
Well, the second (buffered) case indicates that one would benefit from (optionally) running the -race command using a parameter, say "

 go run -race -ch psobcmstu

" (meaning; "Pointers Sent On Buffered Channels May Screw Things Up")
.
:)

Dmitry Vyukov's picture

I've run the stackoverflow code over the tip version of race detector, and it properly identified all 3 races:

==================
WARNING: DATA RACE
Write by goroutine 3:
main.Worker()
/tmp/race.go:50 +0x32f

Previous read by goroutine 1:
main.main()
/tmp/race.go:122 +0x43e

Goroutine 3 (running) created at:
main.main()
/tmp/race.go:109 +0x27f

Goroutine 1 (running) created at:
_rt0_go()
src/pkg/runtime/asm_amd64.s:92 +0x116

==================
==================
WARNING: DATA RACE
Write by goroutine 3:
main.Worker()
/tmp/race.go:48 +0x371

Previous read by goroutine 1:
main.main()
/tmp/race.go:121 +0x41a

Goroutine 3 (running) created at:
main.main()
/tmp/race.go:109 +0x27f

Goroutine 1 (running) created at:
_rt0_go()
src/pkg/runtime/asm_amd64.s:92 +0x116

==================
==================
WARNING: DATA RACE
Write by goroutine 5:
runtime.slicestringcopy()
src/pkg/runtime/slice.c:265 +0x0
strings.(*Reader).Read()
src/pkg/strings/reader.go:38 +0x16d
bufio.(*Scanner).Scan()
src/pkg/bufio/scan.go:165 +0x97a
main.SpawnWork()
/tmp/race.go:84 +0x361

Previous read by goroutine 3:
main.Worker()
/tmp/race.go:47 +0x1ba

Goroutine 5 (running) created at:
main.main()
/tmp/race.go:112 +0x375

Goroutine 3 (running) created at:
main.main()
/tmp/race.go:109 +0x27f
==================

The first two are on at and gc. The last one is on bufio.Scanner internal buffer.
Are there any other races?
I am accepting bug reports right here :)

Dmitry Vyukov's picture

Rolf,
First of all, sending pointers over channels is not a bug. And race detector must not warn about that. False positives kill any bug detection tool.
Then, one can send indices into a slice over a channel with the same result, so it's not actually about pointers.

Dmitry Vyukov's picture

> As an aside, should the test for x,y being in the circle be ≤1 rather than <1?

Since it's Monte Carlo, I think it needs to be:
if c := x*x+y*y; c < 1 || (c == 1 && r.Intn(2) == 0) {
hits++
}

Dmitry Oganezov (Intel)'s picture

It seems like this post reached Google+ Go community and here is a question people ask there:

"As an aside, should the test for x,y being in the circle be ≤1 rather than <1? After all, 1,0 is on the boundary of the circle and the square? The result doesn't differ over the relatively few iterations the playground provides and with the seeded PRNG, but I'd be interested from the theory point of view."

maybe  if x*x+y*y <=1 ? :)

Pages

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.