Tail Call Optimization
I just had an interesting realization about tail call optimization. Often when people talk about it, they simply describe it as an optimization that the compiler does whenever you end a function with a function call whose return value is propagated up as is. Technically this is true. Practically, people use examples like this:
int foo(int increment) { if (increment) return bar() + 1; /* NOT a tail-call */ return bar(); /* a tail-call */ }
It sounds like a very solid example of a tail-call vs. not. I.e., if you “post process” the value before returning it is not a tail-call.
Going back to my realization, I think people often forget about one type of “post processing” — casts. Consider the following code:
extern short bar(); int foo() { return bar(); }
Did you spot it? This is not a tail-call.
The integer promotion from short to int is done after bar returns but before foo returns.
For fun, here’s the disassembly:
$ gcc -Wall -O2 -c test.c $ dis test.o ... foo() foo: 55 pushl %ebp foo+0x1: 89 e5 movl %esp,%ebp foo+0x3: 83 ec 08 subl $0x8,%esp foo+0x6: e8 fc ff ff ff call -0x4 <foo+0x7> foo+0xb: c9 leave foo+0xc: 98 cwtl foo+0xd: c3 ret
For completeness, if we change the return value of bar to int:
$ gcc -Wall -O2 -c test.c $ dis test.o ... foo() foo: e9 fc ff ff ff jmp -0x4 <foo+0x1>
I wonder how many people think they are tail-call optimizing, but in reality they are “wasting” a stack frame because of this silly reason.