nested cilk_for loops

nested cilk_for loops

I am running a verylong execution time c program.I believe that the best way to improve speed is to cilkify the for loops. Some of these loops are nested, sometimes several loops deep.

I am trying to add cilk_for loops to the for loopsin hoping to speed the program up. I have selected the longest running (in terms of % time) functions and concentrated on those.i will then use cilkscreen and cilkview to see which loopsare help by adding cik_for and which are not helped.

The issue is seg faults. When cilkifyng for loop thatis really two for loops nestedwith one nested; I usually succeed only 50% of the time. The other times I get a seg fault. The cilk software does not say where the fault occurs. The program crashes with a seg fault output. Finding seg faults in serial programs is hard enough and it is impossible (it seems) in paraellel programs.

Is the Intel Cilk able to do this consistently with a stable prgram resulting?is it wise to nest for loops into cilk_for loops in terms of program speed up? I thought that somewhere in a Intel Cillk manual this was advised against.

Thank in advance.


10 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

You should certainly be able to have nested cilk_for loops! However, whether it's needed depends on the range of the loops involved. I can't say without knowing more.

Possible ways to debug this:

  • Does the program work properly if you set the number of workers to 1? This can be done by either setting the CILK_NWORKERS environment variable or using the Cilk runtime API call

        __cilkrts_set_param("nworkers", "1");
  • You could have a race condition. Have you run Cilkscreen?

- Barry

    I only run Cilkscreen after I get the program running. I usualy do it after the program compiles and runs to completion.I was unaware that data races can cause a program tohave a segment fault.I thought that data races only resulted in incorrect or inconsistent output.

    It has been my experience that findingsegment faultsin code thatI did not write is very difficult and when you add in parallelizaton then if becomes very hard.

    I just do not know howCilkscreen can handle the code when the code crashes on execution.

    Any help appreciated.


    Generally, you're correct. Cilkscreen should be run on programs that work with 1 worker. Butif you're racing on a pointer, then Cilkscreen might help you diagnose the problem.

    And you haven't told me whether the problem reproduces with 1 worker, which may make it much easier to debug.

    What does the callstack look like at the segfault?

    - Barry

    Can you give me an example of what you are talking about? I am presently without a computer with Cilk. The security people have taken my pcto addsome security tightening software. Nothing serious, this is SOP; I will get it back in a few days.

    However, while I am waiting please give me an example of what you are saying. The racing on a pointer is not something that I am familiar with. Any example will do.

    Just give me something to think about while my pc is being upgraded.

    Thanks in advance.


    Racing on a pointer is no different than racing on any other shared variable. Consider two threads executing the following code, where "my_struct" is a pointer to a structure:

        for (my_struct = list_root; NULL != my_struct; my_struct = my_struct->next)
            if (my_struct->data == what_im_looking_for)
                my_struct->found_it = true;

    If thread 1 and thread 2 execute the loop at the same time, there are any number of ways that bad things could occur, including a segfault. For example, consider that thread 1 executes the if statement which derefences the pointer just after thread 2 has reached the end of the list.Thread 1 willdereference NULL, which results in a segfault.

    Obviously this is an exagerated case, but finding races in your code is really difficult. That's why there are tools like Cilkscreen or Inspector.

    - Barry

    It would be easier to see if it were in a complete program and I was able to compile it, run it, see it crash; and then run Cilkscreen and see the error.

    It would just be easier to see.

    Thanks in advance.


    Do your parallel loops manipulate data structures that are shared?
    If so, you may need to add critical sections or reduction operations.

    SegFaults are generally the result of a pointer gone wrong (or not what it is thought to be).
    An example can be your serial version of the (nested) loop having an exception path for the first iteration where it initializes "something" for use on subsequent iterations. And then the runtime circumstance results in the thread not containing the first iteration indication (e.g. index==0) to pass around the initialization section prior to initialization. This would result in the thread using a non-initialized structure. You have a similar case if the first time indicator is a flag (or NULL pointer) where multiple threads see the indicator as fist time state prior to setting it to initialized (resulting in multiple and concurrent initializations of the shared structure).

    Jim Dempsey

    Please give me a short example that I can use and manipuate. It is much easier for me to see it that way. I try to stay away from segfaults and it seems that I walked (unknowingly)right in to them.

    Where is this discussed in the Intel literature?


    >>Please give me a short example that I can use and manipulate

    Consult the Cilk Plus examples (for simple programs that work).

    Assume you own a valet parking garage where your employee(s) park the cars...
    and you are responsible for any damage.

    Having one employee is similar to single threaded programming.
    Having multiple employees is similar to multi-threaded programming.

    What is different between the two situations?
    What are potential problems?

    You have similar issues with parallel programming.

    trying to park two cars in an empty slot...
    Having two employees tally the total number of cars by looking at the old tally, then adding/subtracting one while other employee doing same thing, then both of them posting posting the new (incorrect) tally.

    You have to look at similar issues in your code.

    Jim Dempsey

    Leave a Comment

    Please sign in to add a comment. Not a member? Join today