Cilk++ penalization of direct function calls

Cilk++ penalization of direct function calls

This thread is a asynchronous continuation of the following discussion with Matteo Frigo on Cilk++ handling of direct function calls:

http://software.intel.com/en-us/forums/showthread.php?t=69681&p=2&o=a&s=lr

I still do not get why you need to allocate stack frames for functions that are called directly. As far as I see, it's redundant.

For the following program:

void C1(){}
void C2(){}

void B()
{
    cilk_spawn C1();
    cilk_spawn C2();
}

void A()
{
    B();
}

int cilk_main(int argc, char** argv)
{
    cilk_spawn A();
    return 0;
}

You generate following stack tree (left picture):

http://www.dabbleboard.com/public?created=dvyukov&myid=1

Why not generate stack tree as show on the right picture?

Direct function call is not a furcation point in execution, so directly called function can live on parent stack. And you still will be able to steal continuation with both functions in single stack frame.

Additional confidence that it's possible comes from the fact that Cilk++ is similar to TBB in this aspect (that fact that Cilk++ spawn continuation and TBB spawns call is irrelevant here, as far as I see). And TBB allocates frames (tasks) only for spawns.

Why not generate two versions of each function - one for direct function call, and one for spawn? Then direct function call proceeds as plain C function call. And spawn allocates a frame for continuation, and then calls the function.

One can reason about this in the following way. You just as if inline all direct function calls, kind of user substituted all direct function calls with function bodies. From this point it's clear that additional frames are not required.

What I am missing? Please make my mind clear, I can't sleep with it. Why Cilk++ penalizes function calls?

All about lock-free algorithms, multicore, scalability, parallel computing and related topics:
http://www.1024cores.net
2 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

Dmitriy,

you are correct: in your example you could lay out the stack in the way that you are suggesting. You cannot do it in general, however, because A does not know the size of B's stack frame. So, as an initial implementation, we did the simple thing of making all frames heap-allocated.

Cheers,
Matteo Frigo

Deixar um comentário

Faça login para adicionar um comentário. Não é membro? Inscreva-se hoje mesmo!