Josef “Jeff” Sipek

Inline Assembly & GCC, clang

Recently, I got to write a bit of inline assembly. In the process I got to test my changes by making a small C file which defined test function that called the inline function from the header. Then, I could look at the disassembly to verify all was well.

#define _KERNEL
#define _ASM_INLINES
#include <sys/atomic.h>

void test(uint32_t *x)
{
	atomic_inc_32(x);
}

GCC has been my go to complier for a long time now. So, at first I was using it to debug my inline assembly. I compiled the test programs using:

$ gcc -Wall -O2 -m64 -c test.c

Disassembling the object file yields the rather obvious:

test()
    test:     f0 ff 07           lock incl (%rdi)
    test+0x3: c3                 ret    

I can’t think of any way to make it better :)

Then, at some point I remembered that Clang/LLVM are pretty good as well. I compiled the same file with clang:

$ clang -Wall -O2 -m64 -c test.c

The result was rather disappointing:

test()
    test:     55                 pushq  %rbp
    test+0x1: 48 89 e5           movq   %rsp,%rbp
    test+0x4: f0 ff 07           lock incl (%rdi)
    test+0x7: 5d                 popq   %rbp
    test+0x8: c3                 ret    

For whatever reason, Clang feels the need to push/pop the frame pointer. I did a little bit of searching, and I couldn’t find a way to disable this behavior.

The story for 32-bit output is very similar (just drop the -m64 from the compiler invocation). GCC produced the superior output:

test()
    test:     8b 44 24 04        movl   0x4(%esp),%eax
    test+0x4: f0 ff 00           lock incl (%eax)
    test+0x7: c3                 ret    

While Clang still wanted to muck around with the frame pointer.

test()
    test:     55                 pushl  %ebp
    test+0x1: 89 e5              movl   %esp,%ebp
    test+0x3: 8b 45 08           movl   0x8(%ebp),%eax
    test+0x6: f0 ff 00           lock incl (%eax)
    test+0x9: 5d                 popl   %ebp
    test+0xa: c3                 ret    

For the curious ones, I’m using GCC 4.8.3 and Clang 3.4.2.

I realize this is a bit of a special case (how often to you make a function that simply calls an inline function?), but it makes me worried about what sort of sub-optimal code Clang produces in other cases.

Powered by blahgd