New Haskell CnC release: 0.1.3. (Adds parallel for loops.)

This is Part 3 in my ongoing series of posts on Haskell CnC (Part 1 here and Part 2 here).


As part of this post I'm happy to announce a new (minor) release.  This release adds cncFor, itemsToList, and various internal improvements. You can get it and test it with these two lines:

   cabal install haskell-cnc-0.1.3
   haskell-cnc-runTests



Also, as of this release the source repository is hosted on code.haskell.org.  It is now possible to grab this release (or the latest version) with a command like the following:


   darcs get -t 0.1.3 --set-scripts-executable http://code.haskell.org/haskell-cnc/




cncFor: Parallel for-loops:


CnC for-loops provide a way to expose a parallel loop to the CnC scheduler, much like cilk_for or tbb::parallel_for for the Cilk and TBB schedulers respectively.  To create a loop, one passes a start and end (inclusive) as well as a StepCode body.  For example, we can compute a serial function "myfun" in parallel, as follows:


  cncFor 1 10 (\i -> put myItems myfun(i))



The backslash introduces an anonymous function (a lambda) that forms the body of the for-loop.

A common use of cncFor -- and one we'll see in the next post on program optimization -- is to add a little more structure when spawning work.  Frequently we want to create new steps 1..N in a CnC graph.  Normally, these would appear as set of unrelated tasks to the runtime system, which is inefficient if N is large.  cncFor, though implemented differently in different Cnc schedulers, in general helps subdivide the work sensibly.

An example -- let's do 10,000 "putts" and thereby spawn 10,000 step instances:


[haskell] cncFor 1 10000 (\i -> putt tags i)[/haskell]



... which, a Haskell programmer, taking advantage of currying, would instead abbreviate:


[haskell] cncFor 1 10000 (putt tags)[/haskell]





As mentioned in my previous post, relative to its Java, .NET and C++ counterparts, the Haskell incarnation of CnC is great for experimentation and rapid prototyping, and cncFor is a good example.  The default implementation of cncFor used by most schedulers is only five lines long:










cncFor start end body =
   do ts <- graphInStep newTagCol
     graphInStep$ prescribe ts$ \(x,y) -> forM_ [x..y] body
     let range_segments = splitInclusiveRange (4*numCapabilities) (start,end)
     forM_ range_segments (putt ts)




A bit of explanation: graphInStep is an internal function that lifts code from the GraphCode monad into the StepCode monad.  That is, it lets us extend the CnC graph, dynamically creating nodes and edges during execution.  In this case, we implement cncFor by splitting the [start,end] range into a constant number of segments based on the number of processors (with an extra coefficient of four to absorb load imbalance -- the same heuristic used by the TBB auto partitioner).  We create a brand new tag collection and step collection to execute these range segments. Voila.


For more complete information about compiler optimizations, see our Optimization Notice.

Comments

's picture

Haskell is a nice language to use from time to time. In some areas, it's just sooo much more powerful than imperative languages like C# or Java.

Paul Steinberg (Intel)'s picture

Excellent post Ryan. I would love to have you as a guest on Teach Parallel. Let me know.

Paul Steinberg