Improve performance in Cilk program

Improve performance in Cilk program

I am new to Cilk and I am trying to implement a simple parallel version of the well know algorithm to construct an optimal binary search tree given the access frequency of each node.My first approach was based on a recursive function + memoization, but the Cilkview and sometest on my machine showed it doesn't have good performance.After that, I tried another approach where I just use cilk_for the parallelization, but it also doesn't perform well.Below there is the piece of code where I am using cilk_for:

for ( int low = n ; low >= 0 ; low )
      c o s t [ low ] [ low ] = 0 . 0 ;
      for ( int high = low + 1 ; high <= n ; ++high )
         c i l k : : r e d u c e r o p a d d  f c o s t ( 0 ) ;
         c i l k : : r e d u c e r m i n  b e s t c o s t ( INF ) ;
         c i l k f o r ( int r = low ; r < high ; ++r )
               f c o s t += p [ r ] ;
               double r c o s t = c o s t [ low ] [ r ] + c o s t [ r + 1 ] [ high ] ;
               c i l k : : min of ( bestcost , r c o s t ) ;
         c o s t [ low ] [ high ] = b e s t c o s t . g e t v a l u e ( ) + f c o s t . g e t v a l u e ( ) ;

The cilkview output for the whole program is:Cilkview Scalability Analyzer V1.1.0, Build 85041) Parallelism ProfileWork :6,800,547,046 instructionsSpan :2,898,542,645 instructionsBurdened span :4,817,900,193 instructionsParallelism : 2.35Burdened parallelism :1.41Number of spawns/syncs:189,804Average instructions / strand :11,943Strands along span : 259,607Average instructions / strand on span :11,165Total number of atomic instructions :1,192,832Frame count : 9451652) Speedup Estimate2 processors: 0.91 - 2.004 processors: 0.87 - 2.358 processors: 0.85 - 2.3516 processors:0.84 - 2.3532 processors:0.83 - 2.35I can see that the burdened parallelism is very low but also I can't think in any other improvementfor this simple cilk_for. I tried to change the grainsize and found that for a 1000 keys instance thebetter results is for grainsize = 100 (the above cilkview output is for that grain size).Could someone give me directions for where I have to look for performance improvement in this case?Caio

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Let's start with details.You're clearly using Cilk++, not Cilk Plus. Which means this really should be discussed in the Cilk++ WhatIf forum. But that's a quibble.

Which operating system are you using? 32 or 64-bit code?

The first thing that comes to mind is that you're parallelizing your innermost loop, which means you're doing very little work in each strand. Yes grainsize will help this, but is there a reason not to parallelize the outer loop?

- Barry

Leave a Comment

Please sign in to add a comment. Not a member? Join today