Inline Assembly & GCC, clang
Recently, I got to write a bit of inline assembly. In the process I got to test my changes by making a small C file which defined test function that called the inline function from the header. Then, I could look at the disassembly to verify all was well.
#define _KERNEL #define _ASM_INLINES #include <sys/atomic.h> void test(uint32_t *x) { atomic_inc_32(x); }
GCC has been my go to complier for a long time now. So, at first I was using it to debug my inline assembly. I compiled the test programs using:
$ gcc -Wall -O2 -m64 -c test.c
Disassembling the object file yields the rather obvious:
test() test: f0 ff 07 lock incl (%rdi) test+0x3: c3 ret
I can’t think of any way to make it better :)
Then, at some point I remembered that Clang/LLVM are pretty good as well. I compiled the same file with clang:
$ clang -Wall -O2 -m64 -c test.c
The result was rather disappointing:
test() test: 55 pushq %rbp test+0x1: 48 89 e5 movq %rsp,%rbp test+0x4: f0 ff 07 lock incl (%rdi) test+0x7: 5d popq %rbp test+0x8: c3 ret
For whatever reason, Clang feels the need to push/pop the frame pointer. I did a little bit of searching, and I couldn’t find a way to disable this behavior.
The story for 32-bit output is very similar (just drop the -m64 from the compiler invocation). GCC produced the superior output:
test() test: 8b 44 24 04 movl 0x4(%esp),%eax test+0x4: f0 ff 00 lock incl (%eax) test+0x7: c3 ret
While Clang still wanted to muck around with the frame pointer.
test() test: 55 pushl %ebp test+0x1: 89 e5 movl %esp,%ebp test+0x3: 8b 45 08 movl 0x8(%ebp),%eax test+0x6: f0 ff 00 lock incl (%eax) test+0x9: 5d popl %ebp test+0xa: c3 ret
For the curious ones, I’m using GCC 4.8.3 and Clang 3.4.2.
I realize this is a bit of a special case (how often to you make a function that simply calls an inline function?), but it makes me worried about what sort of sub-optimal code Clang produces in other cases.
Comment by Steve — December 4, 2014 @ 21:29
Comment by JeffPC — December 6, 2014 @ 15:53