How to use cilkview to analyze portion of program

How to use cilkview to analyze portion of program

I wrote a cilk++ program and the cilkview analyzer suggests that my program has a large overhead (the burdened span is much larger than span). So I am trying to use cilkview to analyze portions of my program to figure out where the overhead mainly comes from. But the cilkview portion analyzer does not provide the right information for the burdened span. For whichever portion I selected, the burdened span is always the same as span. I am not sure if I am using it correctly. Please take a look at the following example I borrowed from the documentation. Are we supposed to use it in this way?

-------- the program ----------------------------------------------------

#include
#include
#include
#include

static const int COUNT = 4;
static const int ITERATION = 1000000;
long arr[COUNT];

long do_work(long k)
{
// Waste time:
long x = 15;
static const int nn = 87;

for (long i = 1; i < nn; ++i) {
x = x / i + k % i;
}
return x;
}

int cilk_main()
{
cilk::cilkview cv;
cv.start();

for (int j = 0; j < ITERATION; j++)
{
cilk_for (int i = 0; i < COUNT; i++)
{
arr[i] += do_work( j * i + i + j);
}
}
cv.stop();
cv.dump("test");
std::cout << cv.accumulated_milliseconds() / 1000.f << " seconds" << std::endl;
return 0;

}

#----------------------the output of cilkview -----------------------------

cilkview: generating scalability data
Statistics for test

1) Parallelism Profile
Work : 6477000037 instructions
Span : 2113000037 instructions
Burdened span : 2113000037 instructions
Parallelism : 3.07
Burdened parallelism : 3.07
Number of spawns/syncs: 3000000
Average instructions / strand : 719
Strands along span : 2000000
Average instructions / strand on span : 1056
Total number of atomic instructions : 0
Frame count : 17000000
2) Speedup Estimate
2 procs: 1.29 - 2.00
4 procs: 1.50 - 3.07
8 procs: 1.64 - 3.07
16 procs: 1.72 - 3.07
32 procs: 1.76 - 3.07

0 seconds
Whole Program Statistics:

Cilkview Scalability Analyzer V1.1.0, Build 8503
1) Parallelism Profile
Work : 6,480,800,399 instructions
Span : 2,116,800,399 instructions
Burdened span : 31,920,800,399 instructions
Parallelism : 3.06
Burdened parallelism : 0.20
Number of spawns/syncs: 3,000,000
Average instructions / strand : 720
Strands along span : 4,000,001
Average instructions / strand on span : 529
Total number of atomic instructions : 16
Frame count : 17000003
2) Speedup Estimate
2 processors: 0.21 - 2.00
4 processors: 0.15 - 3.06
8 processors: 0.13 - 3.06
16 processors: 0.13 - 3.06
32 processors: 0.12 - 3.06

* Note: Analysis forces "grainsize=1" for cilk_for loops that use a default grainsize.
For details, refer to the "cilk_for" section of the Cilk++ Programmer's Guide.

#------------------------------------ end of cilkview output --------------------------------------------------------------------

Any suggestion or some successful example that uses cilkview to analyze portion of the program is greatly appreciated!

Thanks in advance.

Liyun

8 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione
Best Reply

Hi Liyun,

This is a bug in cilkview. The value that it is reporting for "burdened span" is wrongly duplicated from "span." You are using the tool correctly, but its output is incorrect. The consequence is that the projected burdened parallelism is higher than it should be.

Fortunately, the fix is in one of our header files: cilkview.h

If you open it and go to the function "void stop ()" you will see that this is where we accumulate the data. There is a line that says:

total_.burdened_span += end.span - start_.span;

This should be:

total_.burnened_span += end.burdened_span - start_.burdened_span;

If there are any problems, please let me know.

I'm sorry, I forgot to say:

cilkview.h is located wherever you installed Cilk++:

cilk/include/cilk++/cilkview.h

Thank you so much, William. No wonder why the burdened span is always exactly the same as span...I have corrected the formula for burdened_span in cilkview.h. And now it works just fine. Thanks again!

Question continuing:

I have a program written in a divide and coquer manner. It recursively calls itself until reaches some base case. I want to use cilkview to find the best value of the base case size K.

Here is want I get of different base case K

K = 128
Work : 29948069 instructions
Span : 239965 instructions
Burdened span : 344938 instructions
Parallelism : 124.80
Burdened parallelism : 86.82

K = 64
Work : 14982885 instructions
Span : 60894 instructions
Burdened span : 180831 instructions
Parallelism : 246.05
Burdened parallelism : 82.86

K = 32
Work : 7607621 instructions
Span : 16436 instructions
Burdened span : 151364 instructions
Parallelism : 462.86
Burdened parallelism : 50.26

How should I understand these data? For the case K = 32, the burdened parallelism is much larger than parallelism. I guess that indicates the program has spawned too much for not enough instructions. Then if I increase K to 128, the parallelism and burdened parallelism is closer now, So in principle I should set base case to 128 rather than 32. But I notice that the burdened span of case 128 (344938) is larger than the burdened span of case 32 (151364). Does this mean that even with big overhead, case 32 will still run faster than case 128?

Thanks in advance,

Liyun

From your numbers, you might expect similar performance for K=64 and K=128, as the burdened parallelism is similar. The absolute value of the span is not so important - that is the number of instructions on the critical path. The absolute value would only matter if you ran on an infinite number of processors with no scheduling overhead, in which case the program run time would be equal to the span. Performing runs with actual timings may be informative.

steve

Hi William...can you please help me with this problem:
I'm using cilk-8503-i686 on Linux(Ubuntu 11.04)....and I called cilkview functions just like the examples did:
...
#include
....

int cilk_main(int argc, char** argv)
{
cilkview: cv;
...
cv.start();
SOME CODE HERE
cv.stop();
cv.dump("xxxx");
....
}

and I got these errors from linker:
YAM_P.cilk:(.text+0xbd2): undefined reference to `cilk::get_milliseconds()'
YAM_P.cilk:(.text+0xc9c): undefined reference to `cilk::get_milliseconds()'
YAM_P.cilk:(.text+0xe41): undefined reference to `__cilkview_dump'
collect2: ld returned 1 exit status

I've made up some simpler codes to test cilkview and it worked very well...
But what's wrong with this one????

You need to link with the cilkutil static library. The source should be included as part of the kit, but there should be a built version also available.

- Barry

Lascia un commento

Eseguire l'accesso per aggiungere un commento. Non siete membri? Iscriviti oggi