cilkplus tools and workflow discussion

cilkplus tools and workflow discussion

Dear all,

We (i.e., my students and I, that is) are now using the GCC CilkPlus branch quite heavily, and we believe we're making progress in certain kinds of algorithms we're investigating. The purpose of this message is to find out if anyone else lurking in this forum has a CilkPlus "workflow" to share. For example debugging tools, profilers, etc. We do the following:

  1. We use gcc cilk and download cilkutils, which we'd rather have them opensourced and compile our own (32-bit arch are still hard to work on), but ok.
  2. If our cilk program compiles fine, but doesn't scale (as suggested by either cilkview or just comparing to the sequential version), we just time portions of the code using gettimeofday() to find the culprit.
  3. If we suppose the work is not being shared, we try to time how much each thread takes using gettimeofday for each worker thread (using __cilk_get_worker_number).
  4. If we suppose there is a race, we use cilkscreen.

However, this is all pretty manual work. We would like to know what hopeffully non-Intel tools (in particular parallel debuggers and profilers) do people use with cilk to get load blancing statistics per thread, how muc they ran, how much they stayed idle, context switches... again, per thread. Notice, also, for example that, gdb doesn't debugh cilk very well. What do people use as debuggers.

Sorry for the wall of text. Bilaji sent me an email not long ago teaching me how to easily compile gcc (while I was taking an order of magnitude more time compiling the sources for the requierements (like mpc, mpfr, gmp) myself. He saved me a lot of time by using the contrib/download_prerequisites... so what is eveybody doing and what are you using to work with CilkPlus?

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I would run Cilkscreen earlier in your workflow. You should be sure your program is correct before worrying about performance. And a race will cause a cache line to ping-pong between cores, impacting performance.  Cilkscreen will point you right at them, which is alot easier than using the equivalent of printf debugging to find it.

Since you're building your own copy of the Cilk runtime, look at the stats code (stats.h, stats.c).  It's normally compiled out - you'll need to define CILK_PROFILE to enable it.  It may help you find problems, though alot of what it's tracking is pretty obscure (and undocumented!).  Note that you'll need to shutdown the runtime by calling __cilkrts_end_cilk() to get the stats to print out.  You may be able to modify the stats code to make it easier to find your problems.  If you do, we'll gladly accept modifications to the runtime - see for the details.

WARNING: The stats displayed are an internal debugging tool which gets little to no testing and is totally unsupported.  If they help you, great.  If they fail to compile or aren't useful, you're on your own.

    - Barry

Leave a Comment

Please sign in to add a comment. Not a member? Join today