Cilk Fibonancci doesn't work properly

Cilk Fibonancci doesn't work properly

I get the Fibonancci example from a web site but the output doesn't make any sense as serial code takes 2.25 sec while the parallel code takes 4.5 sec.
I'm using visual studio 2013, and Intel parallel studio
this is the code:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <cilk/cilk.h>
#include <cilk/cilk_api.h>

/*Cilk_SPAWN*/
int fib(int n)
{
    if (n < 2)
        return n;
    int x = cilk_spawn fib(n - 1);
    int y = fib(n - 2);
    cilk_sync;
    return x + y;
}

/*Serial Function*/
int fib_s(int n)
{
    if (n < 2)
        return n;
    int x = fib_s(n - 1);
    int y = fib_s(n - 2);
    return x + y;
}

int main(int argc, char *argv[])
{
    /*CILK SPAWN*/
    
    // Fibonacci number to be calculated.  39 is big enough to take a reasonable amount of time
    int n = 39;

    int result ;
    // Time how long it takes to calculate the nth Fibonacci number Serially first
    clock_t start = clock();
    result = fib_s(n);
    clock_t end = clock();
    double dur = (double)(end - start) / CLOCKS_PER_SEC;
    // Display our results
    printf("SERIAL  %.3f sec == \n\n", dur);

    // Time how long it takes to calculate the nth Fibonacci number Parallel 
    start = clock();
    result = fib(n);
    end = clock();
    // Display our results
    dur = (double)(end - start) / CLOCKS_PER_SEC;
    printf("PARALLEL  %.3f sec == \n\n", dur);
    
    return 0;
}

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

That is surprising. What hardware and operating system are you using? Have you done anything to set the number of workers, such as setting CILK_NWORKERS?  Are there any other CPU-intensive programs running at the same time?

Please try the test again with the environment CILK_NWORKERS set to 1 and CILK_NWORKERS set to 2 and list the results here.

-Pablo

P.S. fib is not a great test program, but you should definitely be getting speedup.

In terms of total time spent by all logical processors, as measured by clock(), it's not surprising that you see an increase, particularly on such a short job.   Besides setting CILK_NWORKERS, as Pablo advised, you should be checking elapsed time.  Assuming you have HyperThreading on, you should see minimum elapsed time somewhere between nworkers set to number of cores and number of logical processors minus 1 (almost certainly with fewer than default number of workers).

I thought (but could be wrong) that tasks suitable for cilk_for() might show the most scaling with nworkers, but of course you give up the advantages seen with processor affinity in threading models which support it.

On my 2-core laptop, elapsed times for {1,2,3,4} workers are {6.4,3.9,3.0,2.6} sec.  Your serial code is much faster.

I forgot to mention something.  For small jobs, the Cilk startup overhead dominates the total time.  The Cilk runtime starts up its worker threads the first time you call a function that does a spawn.  To get a more realistic timing, add these lines to the beginning of main().

cilk_spawn fib_s(1);
cilk_sync;

Time it if you want to measure the overhead.  I'm pretty sure that if you do this, the time spent in the call to fib(39) will go way down.

-Pablo

It's really not surprising that your serial code is faster than the Cilk code.  Fib is the posterchild for programs that don't do enough in their spawned functions. What you're really measuring is the overhead imposed by Cilk.

See Jim Sukha's article Why is Cilk™ Plus not speeding up my program? (Part 1). Entry #1 is "Parallel regions with insufficient work"

Yes, it looks like there is only small amount of actual work as Barry said.

This is not the first time we see this question, and there are some suggestions in the following post if we want to see any performance gain from Fibonacci code.

https://software.intel.com/en-us/forums/intel-cilk-plus/topic/559858

Thank you all for helping me 

first to Pablo Halpern
My CPU is Intel core i7-4720HQ, i'm running windows 8.1 (64 bit ).
I tried with {1,2,4,6, and 8} workers and the time result is {16.061,8.438,5.769,4.882, and 4.311} sec
Moreover i tried to add the 2 lines you mentioned but the overhead time almost 0 at 16 and 8 workers.
please see the image 

https://drive.google.com/open?id=0BwocMhGn9e4FVGgtdWc4cEdXQXM

second Pablo HalpernBarry Tannenbaum, Hansang B, and Tim P

I saw a tutorial for someone in which he use the same code. And there is an improvement in time that i can't achieve with my i7 CPU 
Please if you have enough time to see his result at time 4:26 in the link
https://www.youtube.com/watch?v=JnUEOvitBN8&index=2&list=PLNMIeqLiUa7ZseuQFj3tI6dCtVG71JUgU
I wonder if any one of you mention a good example to just see the power of cilk. 

 

The Fibonacci example is intended to illustrate how to perform recursive parallelism. This example is NOT intended to illustrate how to best perform the Fibonacci series calculation using parallel programming.

The intension of the example is for you to examine your own problem+solution and determine if it can be reformulated into a recursive expression (with inner most recursion level having sufficient work to be beneficial to parallelization). The example has a secondary benefit in illustrating cases where parallelization is not sufficiently efficient (which is a valuable learning experience).

Jim Dempsey

Leave a Comment

Please sign in to add a comment. Not a member? Join today