Tail call optimization

Tail call optimization

I'm running the C++ compiler on Debian amd64 with a 2.6 kernel. The compiler fails to tail optimize the following code:


void foo() __attribute__((noinline));
void bar() __attribute__((noinline));

void bar() { printf("f()
"); }
void foo() { bar(); }

int main(int argc, char *argv[])
return 0;

gcc 4.2 with -O3 generates the following assembly instructions for foo():

xor %eax,%eax
jmpq 4004a0

and the Intel compiler with -fast generates this:

push %rsi
callq 4002a0
pop %rcx

Am I missing some compiler option here? Can someone please explain this to me?

Thank you.

4 posts / 0 new

I might be missing something here too. In your example, the motivation for disabling the usual optimizations in both compilers by setting __attribute__((noinline)) aren't obvious. If the functions were too big for inline to work, it seems that tail call optimization wouldn't gain much, and could still hinder profiling. No doubt, more compelling cases, at least with tail recursion, could be set up, where special optimizations in gcc would look attractive.

The reason I used __attribute__((noinline)) was to emulate a C++ virtual function call which cannot be inlined. I did not use a C++ example in my first post in the interest of clarity and simplicity.
In my C++ tests, both compilers produce the same assembly listed in my first post. The gcc compiler tail optimizes and the Intel compiler does not. Are there any cases where the Intel compiler *does* tail optimize?

Thanks for bringing this to our attention, I've submitted an issue on this and will let you know when it's addressed.


Leave a Comment

Please sign in to add a comment. Not a member? Join today