GCC Optimizations
Recently, I’ve been given a hang bug to work on. This lead me to a another bug related to timing which pushed me to clean up a time related #define which uncovered at least two bugs. Got all that? Good. The rest of this post is going to talk about the changed define, and one of the “at least two bugs”. When I talk about GCC, I’m talking about the Illumos-specific GCC version 4.4.4. (Illumos needs a couple of features that stock GCC doesn’t provide.)
The #define change I’m hoping to make is very simple:
diff --git a/usr/src/uts/common/sys/time.h b/usr/src/uts/common/sys/time.h --- a/usr/src/uts/common/sys/time.h +++ b/usr/src/uts/common/sys/time.h @@ -234,7 +234,7 @@ struct itimerval32 { #define SEC 1 #define MILLISEC 1000 #define MICROSEC 1000000 -#define NANOSEC 1000000000 +#define NANOSEC 1000000000ll #define MSEC2NSEC(m) ((hrtime_t)(m) * (NANOSEC / MILLISEC)) #define NSEC2MSEC(n) ((n) / (NANOSEC / MILLISEC))
Without it, multiplying by NANOSEC will cause integer overflow issues on IPL32 and LP64 systems (read: basically everywhere).
One of the “at least two bugs“ involves a simple (buggy) function aptly named tv2ns as it converts a struct timeval to a 64-bit nanosecond count:
static int64_t tv2ns(struct timeval *tvp) { return (tvp->tv_sec * NANOSEC + tvp->tv_usec * 1000); }
At first glance, this function looks correct. The only flaw with it is that first portion of the expression multiplies a time_t (32-bit signed int) with an int (also 32-bit signed) making the result of that subexpression 32-bit signed expression. With NANOSEC changed to a long long, everything works as expected. Now, the fun part… disassembling this function without the fix. You don’t have to be an expert to see that this function is strangely repetitive. I’ve annotated the assembly.
tv2ns: movl 0x4(%esp),%eax ; eax = tvp tv2ns+4: movl 0x4(%eax),%edx ; edx = tvp->tv_usec tv2ns+7: leal (%edx,%edx,4),%edx ; edx = edx + 4 * edx tv2ns+0xa: leal (%edx,%edx,4),%edx ; = 5 * edx tv2ns+0xd: leal (%edx,%edx,4),%edx ; ; at this point: edx = 5 * 5 * 5 * tvp->tv_usec, ; which is the same as: 125 * tvp->tv_usec ; tv2ns+0x10: movl (%eax),%eax ; eax = tvp->tv_sec tv2ns+0x12: leal (%eax,%eax,4),%eax ; eax = eax + 4 * eax tv2ns+0x15: leal (%eax,%eax,4),%eax ; = 5 * eax tv2ns+0x18: leal (%eax,%eax,4),%eax tv2ns+0x1b: leal (%eax,%eax,4),%eax tv2ns+0x1e: leal (%eax,%eax,4),%eax tv2ns+0x21: leal (%eax,%eax,4),%eax tv2ns+0x24: leal (%eax,%eax,4),%eax tv2ns+0x27: leal (%eax,%eax,4),%eax tv2ns+0x2a: leal (%eax,%eax,4),%eax ; ; at this point, eax = 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * tvp->tv_sec, ; which is the same as: 1953125 * tvp->tv_sec ; tv2ns+0x2d: shll $0x9,%eax ; eax <<= 9 ; ; eax = (1953125 * tvp->tv_sec) << 9, ; which suprprisingly ends up being the same as: 1000000000 * tvp->tv_sec ; ; so, now we have 'eax' with the tv_sec converted to nanoseconds and 'edx' ; with 125 * tv_usec ; tv2ns+0x30: leal (%eax,%edx,8),%eax ; eax = eax + 8 * edx ; ; 8 * 125 = 1000, which is the factor to convert tv_usec to nanoseconds! ; tv2ns+0x33: cltd ; sign-extend eax to edx:eax tv2ns+0x34: ret
I found it interesting that GCC decided to emit leal instructions to multiply by 5 and then finish it off with a shift and another leal. This is another one of those times when I realize that the compiler is smarter than me. (The sign-extension of course happens too late — all the math needs to happen as 64-bit arithmetic, but that’s not GCC’s fault.)
For the record, with the #define changed, the function looks like the following — sorry, no comments on this one:
tv2ns: pushl %edi tv2ns+1: pushl %esi tv2ns+2: pushl %ebx tv2ns+3: subl $0x8,%esp tv2ns+6: movl 0x18(%esp),%ecx tv2ns+0xa: movl 0x4(%ecx),%eax tv2ns+0xd: leal (%eax,%eax,4),%eax tv2ns+0x10: leal (%eax,%eax,4),%eax tv2ns+0x13: leal (%eax,%eax,4),%ebx tv2ns+0x16: shll $0x3,%ebx tv2ns+0x19: movl %ebx,%esi tv2ns+0x1b: sarl $0x1f,%esi tv2ns+0x1e: movl $0x3b9aca00,%edi tv2ns+0x23: movl (%ecx),%eax tv2ns+0x25: imull %edi tv2ns+0x27: movl %eax,(%esp) tv2ns+0x2a: movl %edx,0x4(%esp) tv2ns+0x2e: addl %ebx,%eax tv2ns+0x30: adcl %esi,%edx tv2ns+0x32: addl $0x8,%esp tv2ns+0x35: popl %ebx tv2ns+0x36: popl %esi tv2ns+0x37: popl %edi tv2ns+0x38: ret
Maybe one day I’ll rummage through my brain and dig up other times that GCC is outsmarted me and blahg about them. :)