Debugging with mdb
Recently, Theo Schlossnagle posted two interesting articles about debugging on Illumos using mdb. They are MDB, CTF, DWARF, and other angelic things, and mdb custom dmods.
Recently, Theo Schlossnagle posted two interesting articles about debugging on Illumos using mdb. They are MDB, CTF, DWARF, and other angelic things, and mdb custom dmods.
Recently, I was doing some file system changes. Obviously, I wanted to run them as an unprivileged user. Unfortunately, the test involved mounting and unmounting a filesystem (tmpfs to be specific). At first I was going to set up a sudo rule to allow mount and umount to run without asking for a password. Then I remembered that I should be able to give the unprivileged user the additional privileges. It turns out that there is only one privilege (sys_mount) necessary to delegate…and it is easy to do!
$ usermod -K defaultpriv=basic,sys_mount jeffpc
Then it’s a matter of logging out and back in. We can check using ppriv:
$ ppriv $$
925: bash
flags = <none>
E: basic,sys_mount
I: basic,sys_mount
P: basic,sys_mount
L: all
At this point, mounting and unmounting works without sudo or similar user switching:
$ mkdir tmp $ mount -F tmpfs none /tmp/tmp $ df -h /tmp/tmp Filesystem Size Used Avail Use% Mounted on swap 2.6G 0 2.6G 0% /tmp/tmp
In the past, I’ve talked about serial consoles. I have described how to set up a serial console on Solaris/OpenIndiana. I’ve talked about Grub’s composite console in Illumos-based distros. This time, I’m going do describe the one trick necessary to get tip(1) in a zone working.
In my case, I am using SmartOS to run my zones. Sadly, SmartOS doesn’t support device pass-through of this sort, so I have to tweak the zone config after I create the zone with vmadm.
Let’s assume that the serial port I want to pass through is /dev/term/a. Passing it through into a zone is as easy as:
[root@isis ~]# zonecfg -z 7cff99f6-2b01-464d-9f72-d0ef16ce48af zonecfg:7cff99f6-2b01-464d-9f72-d0ef16ce48af> add device zonecfg:7cff99f6-2b01-464d-9f72-d0ef16ce48af:device> set match=/dev/term/a zonecfg:7cff99f6-2b01-464d-9f72-d0ef16ce48af:device> end zonecfg:7cff99f6-2b01-464d-9f72-d0ef16ce48af> commit
At this point, you’ll probably want to reboot the zone (I don’t remember if it is strictly necessary). Once it is back up, you’ll want to get into the zone and point your software of choice at /dev/term/a. It doesn’t matter that you are in a zone. The same configuration rules apply — in my case, it’s the same change to /etc/remote as I described previously.
One of the items on my ever growing TODO list (do these ever shrink?) was to see if inlining Illumos’s atomic_* functions would make any difference. (For the record, these functions atomically manipulate variables. You can read more about them in the various man pages — atomic_add, atomic_and, atomic_bits, atomic_cas, atomic_dec, atomic_inc, atomic_or, atomic_swap.) Of course once I looked at the issue deeply enough, I ended up with five cleanup patches. The gist of it is, inlining them caused not only about 1% kernel performance improvement on the benchmarks, but also reduced the kernel size by a couple of kilobytes. You can read all about it in the associated bugs (5042, 5043, 5044, 5045, 5046, 5047) and the patch 0/6 email I sent to the developer list. In this blahg post, I want to talk about how exactly Illumos presents these atomic functions in a stable ABI but at the same time allows for inlines.
It should come as no surprise that the “content” of these functions really needs to be written in assembly. The functions are 100% implemented in assembly in usr/src/common/atomic. There, you will find a directory per architecture. For example, in the amd64 directory, we’ll find the code for a 64-bit atomic increment:
ENTRY(atomic_inc_64) ALTENTRY(atomic_inc_ulong) lock incq (%rdi) ret SET_SIZE(atomic_inc_ulong) SET_SIZE(atomic_inc_64)
The ENTRY, ALTENTRY, and SET_SIZE macros are C preprocessor macros to make writing assembly functions semi-sane. Anyway, this code is used by both the kernel as well as userspace. I am going to ignore the userspace side of the picture and talk about the kernel only.
These assembly functions, get mangled by the C preprocessor, and then are fed into the assembler. The object file is then linked into the rest of the kernel. When a module binary references these functions the krtld (linker-loader) wires up those references to this code.
Replacing these function with inline functions (using the GNU definition) would be fine as far as all the code in Illumos is concerned. However doing so would remove the actual functions (as well as the symbol table entries) and so the linker would not be able to wire up any references from modules. Since Illumos cares about not breaking existing external modules (both open source and closed source), this simple approach would be a no-go.
Before I go into the next and final approach, I’m going to make a small detour through C land.
First off, let’s say that we have a simple function, add, that returns the sum of the two integer arguments, and we keep it in a file called add.c:
#include "add.h"
int add(int x, int y)
{
return x + y;
}
In the associated header file, add.h, we may include a prototype like the following to let the compiler know that add exists elsewhere and what types to expect.
extern int add(int, int);
Then, we attempt to call it from a function in, say, test.c:
#include "add.h"
int test()
{
return add(5, 7);
}
Now, let’s turn these two .c files into a .so. We get the obvious result — test calls add:
test()
test: be 07 00 00 00 movl $0x7,%esi
test+0x5: bf 05 00 00 00 movl $0x5,%edi
test+0xa: e9 b1 fe ff ff jmp -0x14f <0xc90>
And the binary contains both functions:
$ /usr/bin/nm test.so | egrep '(Value|test$|add$)' [Index] Value Size Type Bind Other Shndx Name [74] | 3520| 4|FUNC |GLOB |0 |13 |add [65] | 3536| 15|FUNC |GLOB |0 |13 |test
Now suppose that we modify the header file to include the following (assuming GCC’s inline definition):
extern int add(int, int);
extern inline int add(int a, int b)
{
return a + b;
}
If we compile and link the same .so the same way, that is we feed in the object file with the previously used implementation of add, we’ll get a slightly different binary. The invocation of add will use the inlined version:
test()
test: b8 0c 00 00 00 movl $0xc,%eax
test+0x5: c3 ret
But the binary will still include the symbol:
$ /usr/bin/nm test.so | egrep '(Value|test$|add$)' [Index] Value Size Type Bind Other Shndx Name [72] | 3408| 4|FUNC |GLOB |0 |11 |add [63] | 3424| 6|FUNC |GLOB |0 |11 |test
Neat, eh?
How does this apply to the atomic functions? Pretty simply. As I pointed out, usr/src/common/atomic contains the pure assembly implementations — these are the functions you’ll always find in the symbol table.
The common header file that defines extern prototypes is usr/src/uts/common/sys/atomic.h.
Now, the trick. If you look carefully at the header file, you’ll spot a check on line 39. If all the conditions are true (kernel code, GCC, inline assembly is allowed, and x86), we include asm/atomic.h — which lives at usr/src/uts/intel/asm/atomic.h. This is where the extern inline versions of the atomic functions get defined.
So, kernel code simply includes <sys/atomic.h>, and if the stars align properly, any atomic function use will get inlined.
Phew! This ended up being longer than I expected. :)
In the past, I’ve described how to get a serial console going on Illumos based systems. If you ever used a serial console in Grub (regardless of the OS you ended up booting), you probably know that telling Grub to output to a serial port causes the VGA console to become totally useless — it’s blank.
Well, if you are using Illumos, you are in luck. About 5 months ago, Joyent integrated a “composite console” in Grub. You can read the full description in the bug report/feature request. The short version is: all grub output can be sent to both the VGA console as well as over a serial port.
It is very easy to configure. In your menu.lst, change the terminal to composite. For example, this comes from my test box’s config file (omitting the uninteresting bits):
serial --unit=0 --speed=115200 terminal composite
Note the use of composite instead of serial. That’s all there is to it.
Lately, I started poking around the Illumos memory management code. As I’ve done in the past, I decided to use this blahg as a place to document some of my discoveries.
In Illumos (and Solaris), address spaces are managed as sets of segments. Each segment has a base address, length, and a number of other properties. This is true for both process memory as well as kernel memory. Do not confuse these segments with
memory segmentation that processors like
x86 provide.
Each process has its own struct as:
> ::pgrep vim
S PID PPID PGID SID UID FLAGS ADDR NAME
R 10852 10777 10850 10777 101 0x4a004000 ffffff0411e1c0a0 vim
> ffffff0411e1c0a0::print proc_t p_as | ::print struct as a_segtree
a_segtree = {
a_segtree.avl_root = 0xffffff03f7c62ea8
a_segtree.avl_compar = as_segcompar
a_segtree.avl_offset = 0x20
a_segtree.avl_numnodes = 0x18
a_segtree.avl_size = 0x60
}
The kernel address space is maintained in the kas global:
> kas::print a_segtree
a_segtree = {
a_segtree.avl_root = kvseg+0x20
a_segtree.avl_compar = as_segcompar
a_segtree.avl_offset = 0x20
a_segtree.avl_numnodes = 0x9
a_segtree.avl_size = 0x60
}
(Once upon a time this set of segments was a linked list, but for a long while now it has been an AVL tree indexed by the base address.)
Regardless of which address space we’re dealing with, the same rules apply: segments represent contiguous regions within the address space. Each segment can represent a different type of memory. For example, walking the kernel address space segment tree yields nine different segments of four different types (kpm, kmem, kp, and map):
> kas::print a_segtree | ::walk avl | ::printf "%p.%016x %a\n" "struct seg" s_base s_size s_ops fffffe0000000000.000000031e000000 segkpm_ops ffffff0000000000.0000000017000000 segkmem_ops ffffff0017000000.0000000080000000 segkp_ops ffffff0097000000.00000002fca00000 segkmem_ops ffffff03d3a00000.0000000004000000 segmap_ops ffffff03d7a00000.000000fbe8600000 segkmem_ops ffffffffc0000000.000000003b7fb000 segkmem_ops fffffffffb800000.0000000000550000 segkmem_ops ffffffffff800000.0000000000400000 segkmem_ops
Illumos comes with seven different architecture- and platform-independent segment drivers. A segment driver is a “driver” that implements a couple of functions to manage a segment of memory. That is, each segment type can handle page faults, page locking, sync operations, etc. differently.
For example, suppose that a page fault occurs because a process tried to load a value from a page that lacks a page table entry. The platform specific (assembly) fault handling code gets invoked by the processor. After doing a little bit of work, it calls into the generic (C) fault handling code, as_fault. There, the segtree AVL tree is consulted and the corresponding segment’s fault operation gets invoked.
(Solaris Internals lists 12 and 11 segment drivers, respectively, in the two editions.) In Illumos, the seven common segment drivers are:
There are also two platform specific segment drivers:
This is a lot of different segment drivers. Are all of them used all the time? Well, sort of. The mdb output earlier shows that the (amd64) kernel uses only four different segment drivers (kpm, kmem, kp, and map). A typical userspace process is very boring — it is only made up of vn segments. There are, however, exceptions. For instance, Xorg uses vn and dev. This accounts for six of the seven drivers. The last common segment driver is spt, which provides System V shared memory. (I talked about SysV shared memory previously.) So, on a 64-bit x86 system, all seven common segment drivers are in use.
The story is a bit different on 32-bit kernels. Since a 32-bit system has much smaller address space, the kernel tries to eliminate a number of mappings. Here is the list of segments in a 32-bit kernel:
> kas::print a_segtree | ::walk avl | ::printf "%p %a\n" "struct seg" s_base s_ops b5802000 segmap_ops b6800000 segkmem_ops ef400000 segkmem_ops fe800000 segkmem_ops ff000000 segkmem_ops
As you can see, the kp and kpm segments went away. While at first this is surprising, it actually makes perfect sense. When thinking about memory there are two “types” to consider: physical and virtual. In theory, one can have more virtual than physical thanks to the MMU but in reality this is only true on 64-bit systems. The physical memory sizes have outgrown 4 GB a number of years ago and therefore a 32-bit address space can trivially be 100% backed by physical memory. In other words, 32-bit address spaces are tight on virtual memory, while 64-bit address spaces are “tight” on physical memory.
Let’s consider the disappearance of the kp segment on 32-bits. What does kp let us do? It lets us oversubscribe physical memory by backing some virtual memory with disk space. On 32-bit systems we have enough physical memory to back all the virtual memory in the kernel so we don’t need to back some of it by disk. So we have no use for it. (Yes, the kernel still could have paged parts of itself out, but kernel text and data is generally considered important enough to keep it in non-pageable memory. The memory utilization will more than pay for itself by the performance improvement of not having the kernel paged out.)
As I stated before, kpm segments map physical memory into the kernel’s address space for performance reasons (without it the kernel would have to temporarily map a page to access the contents). Therefore, they are good candidates for removal when it comes to slimming down the kernel’s address space demands. (Well, the actual story is the other way… the introduction of 64-bit capable hardware allowed kpm segments to exist to improve kernel performance.)
While investigating whether some memory management code was still in use (I’ll blahg about this in the future), I ended up learning quite a bit about shared memory on Unix systems. Since I managed to run into a couple of non-obvious snags while trying to get a simple test program running, I thought I’d share my findings here for my future self.
All in all, there are three ways to share memory between processes on a modern Unix system.
This is the oldest of the three. First you call shmget to set up a shared memory segment and then you call shmat to map it into your address space. Here’s a quick example that does not do any error checking or cleanup:
void sysv_shm()
{
int ret;
void *ptr;
ret = shmget(0x1234, 4096, IPC_CREAT);
printf("shmget returned %d (%d: %s)\n", ret, errno,
strerror(errno));
ptr = shmat(ret, NULL, SHM_PAGEABLE | SHM_RND);
printf("shmat returned %p (%d: %s)\n", ptr, errno, strerror(errno));
}
What’s so tricky about this? Well, by default Illumos’s shmat will return EPERM unless you are root. This sort of makes sense given how this flavor of shared memory is implemented. (Hint: it’s all in the kernel)
As is frequently the case, POSIX came up with a different interface and different semantics for shared memory. Here’s the POSIX shm version of the above function:
void posix_shm()
{
int fd;
void *ptr;
fd = shm_open("/blah", O_RDWR | O_CREAT, 0666);
printf("shm_open returned %d (%d: %s)\n", fd, errno,
strerror(errno));
ftruncate(fd, 4096); /* IMPORTANT! */
ptr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
printf("mmap returned %p (%d: %s)\n", ptr, errno, strerror(errno));
}
The very important part here is the ftruncate call. Without it, shm_open may create an empty file and mmaping an empty file won’t work very well. (Well, on Illumos mmap succeeds, but you effectively have a 0-length mapping so any loads or stores will result in a SIGBUS. I haven’t tried other OSes.)
Aside from the funny looking path (it must start with a slash, but cannot contain any other slashes), shm_open looks remarkably like the open system call. It turns out that at least on Illumos, shm_open is implemented entirely in libc. The implementation creates a file in /tmp based on the path provided and the file descriptor that it returns is actually a file descriptor for this file in /tmp. For example, “/blah” input translates into “/tmp/.SHMDblah”. (There is a second file “/tmp/.SHMLblah” that doesn’t live very long. I think it is a lock file.) The subsequent mmap call doesn’t have any idea that this file is special in any way.
Does this mean that you can reach around shm_open and manipulate the object directly? Not exactly. POSIX states: “It is unspecified whether the name appears in the file system and is visible to other functions that take pathnames as arguments.”
The big difference between POSIX and SysV shared memory is how you refer to the segment — SysV uses a numeric key, while POSIX uses a path.
The last way of sharing memory involves no specialized APIs. It’s just plain ol’ mmap on an open file. For completeness, here’s the function:
void mmap_shm()
{
int fd;
void *ptr;
fd = open("/tmp/blah", O_RDWR | O_CREAT, 0666);
printf("open returned %d (%d: %s)\n", fd, errno, strerror(errno));
ftruncate(fd, 4096); /* IMPORTANT! */
ptr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
printf("mmap returned %p (%d: %s)\n", ptr, errno, strerror(errno));
}
It is very similar to the POSIX shm code example. As before, we need the ftruncate to make the shared file non-empty.
In case you’ve wondered what SysV or POSIX shm segments look like on Illumos, here’s the pmap output for a process that basically runs the first two examples above.
6343: ./a.out
0000000000400000 8K r-x-- /storage/home/jeffpc/src/shm/a.out
0000000000411000 4K rw--- /storage/home/jeffpc/src/shm/a.out
0000000000412000 16K rw--- [ heap ]
FFFFFD7FFF160000 4K rwxs- [ dism shmid=0x13 ]
FFFFFD7FFF170000 4K rw-s- /tmp/.SHMDblah
FFFFFD7FFF180000 24K rwx-- [ anon ]
FFFFFD7FFF190000 4K rwx-- [ anon ]
FFFFFD7FFF1A0000 1596K r-x-- /lib/amd64/libc.so.1
FFFFFD7FFF33F000 52K rw--- /lib/amd64/libc.so.1
FFFFFD7FFF34C000 8K rw--- /lib/amd64/libc.so.1
FFFFFD7FFF350000 4K rwx-- [ anon ]
FFFFFD7FFF360000 4K rwx-- [ anon ]
FFFFFD7FFF370000 4K rw--- [ anon ]
FFFFFD7FFF380000 4K rw--- [ anon ]
FFFFFD7FFF390000 4K rwx-- [ anon ]
FFFFFD7FFF393000 348K r-x-- /lib/amd64/ld.so.1
FFFFFD7FFF3FA000 12K rwx-- /lib/amd64/ld.so.1
FFFFFD7FFF3FD000 8K rwx-- /lib/amd64/ld.so.1
FFFFFD7FFFDFD000 12K rw--- [ stack ]
total 2120K
You can see that the POSIX shm file got mapped in the standard way (address FFFFFD7FFF170000). The SysV shm segment is special — it is not a plain old memory map (address FFFFFD7FFF160000).
That’s it for today. I’m going to talk about segment types in the different post in the near future.
Recently, I blahgd about GCC optimizing code interestingly. There, I mentioned a couple of bugs I’ve stumbled across. I’m going to talk more about them in this post.
It all started when I got assigned a bug at work. “The installer hangs while checking available disks.” That’s the extent of the information I was given along with a test system. It didn’t take long to figure that devfsadm -c disk was waiting on a kernel thread that didn’t seem to be making any progress:
swtch+0x141 cv_timedwait_hires+0xec cv_reltimedwait+0x51 ibdm`ibdm_ibnex_port_settle_wait+0x5f ib`ibnex_bus_config+0x1e8 devi_config_common+0xa5 mt_config_thread+0x58 thread_start+8
The function of interest here is ibdm_ibnex_port_settle, but before I talk about it I need to mention that the ibdm kmod stashes a ddi_get_time timestamp of when the HCA attached. Now, ibdm_ibnex_port_settle calls ibdm_get_waittime to get a delay to feed to cv_reltimedwait. The delay is (more or less) calculated as: ddi_get_time() - hca_attach_time. This works fine as long as ddi_get_time continues incrementing at a constant rate (1 sec/sec).
You may already see where this is going. The problem is that ddi_get_time returns a Unix timestamp based on the current time-of-day clock. If the TOD setting changes for whatever reason (daylight saving time adjustments, NTP, etc.), the value returned by ddi_get_time may change non-monotonically. This makes it unsuitable for calculating timeouts and wait times. Converting ibdm_get_waittime to use a monotonic clock source (like gethrtime or ddi_get_lbolt) fixes this bug. (Illumos bug 4777)
Things get a bit worse. While figuring out what ddi_get_time does, I noticed that the man page actively encouraged developers to use it for timeouts. (Illumos bug 4776)
Of course, once I knew about this potential abuse, I had to check that there weren’t similar issues elsewhere in the kernel… and so I got to file bugs for iprb (4778), vhci (4779), COMSTAR iSCSI target (4780), sd (4781), usba (4782), emlxs (4786), ipf (4787), mac (4788), amr (4789), arcmsr (4790), aac (4791), and heci (4792).
I’m fixing all except: amr, arcmsr, aac, and heci.
While developing the series of fixes mentioned in the previous section, I ran into the fact that NANOSEC was defined as 1000000000. This made it an int — a 32-bit signed integer (on both ILP32 and LP64).
If NANOSEC (defined this way) is used to convert seconds to nanoseconds (by multiplying), the naive approach will fail with quantities larger than 2 seconds. For example (hrtime_t is a 64-bit signed int):
hrtime_t convert(int secs)
{
return (secs * NANOSEC);
}
Since both secs and NANOSEC are integers, the compiler will compute the product and then sign extend the result to 64-bits. If you look around the Illumos codebase, you’ll see plenty of places that cast or use ULL or LL suffix to make the compiler do the right thing. Why not just change the definition of NANOSEC to include a LL suffix releaving the users of this tedious (and error prone!) duty? Well, now you know what Illumos bug 4809 is about. :)
So, I changed the definition and rebuilt everything. Then, using wsdiff (think: recursive diff that understands how to compare ELF files) I found two places where the before and after binaries differed for non-trivial reasons. (I define a trivial reason as “the compiler decided to use registers differently, but the result is the same”.) Each non-trivial difference implies that there was an expression that changed — it used to be busted!
The first difference was in ZFS (Illumos bug 4810). There, spa_async_tasks_pending miscalculated a timeout making the condition always true.
The second difference was in in.mpathd. 4811). This daemon has a utility function to convert a struct timeval into a hrtime_t. You can read more about it in my previous post.
Before the NANOSEC change, I would have needed casts to fix this. With the change in definition, I don’t have to change a thing! And that’s how a one liner closed three bugs at the same time:
commit b59e2127f21675e88c58a4dd924bc55eeb83c7a6
Author: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Date: Mon Apr 28 15:53:04 2014 -0400
4809 NANOSEC should be 'long long' to avoid integer overflow bugs
4810 spa_async_tasks_pending suffers from an integer overflow bug
4811 in.mpathd: tv2ns suffers from an integer overflow bug
Reviewed by: Marcel Telka <marcel.telka@nexenta.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Recently, I’ve been given a hang bug to work on. This lead me to a another bug related to timing which pushed me to clean up a time related #define which uncovered at least two bugs. Got all that? Good. The rest of this post is going to talk about the changed define, and one of the “at least two bugs”. When I talk about GCC, I’m talking about the Illumos-specific GCC version 4.4.4. (Illumos needs a couple of features that stock GCC doesn’t provide.)
The #define change I’m hoping to make is very simple:
diff --git a/usr/src/uts/common/sys/time.h b/usr/src/uts/common/sys/time.h
--- a/usr/src/uts/common/sys/time.h
+++ b/usr/src/uts/common/sys/time.h
@@ -234,7 +234,7 @@ struct itimerval32 {
#define SEC 1
#define MILLISEC 1000
#define MICROSEC 1000000
-#define NANOSEC 1000000000
+#define NANOSEC 1000000000ll
#define MSEC2NSEC(m) ((hrtime_t)(m) * (NANOSEC / MILLISEC))
#define NSEC2MSEC(n) ((n) / (NANOSEC / MILLISEC))
Without it, multiplying by NANOSEC will cause integer overflow issues on IPL32 and LP64 systems (read: basically everywhere).
One of the “at least two bugs“ involves a simple (buggy) function aptly named tv2ns as it converts a struct timeval to a 64-bit nanosecond count:
static int64_t
tv2ns(struct timeval *tvp)
{
return (tvp->tv_sec * NANOSEC + tvp->tv_usec * 1000);
}
At first glance, this function looks correct. The only flaw with it is that first portion of the expression multiplies a time_t (32-bit signed int) with an int (also 32-bit signed) making the result of that subexpression 32-bit signed expression. With NANOSEC changed to a long long, everything works as expected. Now, the fun part… disassembling this function without the fix. You don’t have to be an expert to see that this function is strangely repetitive. I’ve annotated the assembly.
tv2ns: movl 0x4(%esp),%eax ; eax = tvp tv2ns+4: movl 0x4(%eax),%edx ; edx = tvp->tv_usec tv2ns+7: leal (%edx,%edx,4),%edx ; edx = edx + 4 * edx tv2ns+0xa: leal (%edx,%edx,4),%edx ; = 5 * edx tv2ns+0xd: leal (%edx,%edx,4),%edx ; ; at this point: edx = 5 * 5 * 5 * tvp->tv_usec, ; which is the same as: 125 * tvp->tv_usec ; tv2ns+0x10: movl (%eax),%eax ; eax = tvp->tv_sec tv2ns+0x12: leal (%eax,%eax,4),%eax ; eax = eax + 4 * eax tv2ns+0x15: leal (%eax,%eax,4),%eax ; = 5 * eax tv2ns+0x18: leal (%eax,%eax,4),%eax tv2ns+0x1b: leal (%eax,%eax,4),%eax tv2ns+0x1e: leal (%eax,%eax,4),%eax tv2ns+0x21: leal (%eax,%eax,4),%eax tv2ns+0x24: leal (%eax,%eax,4),%eax tv2ns+0x27: leal (%eax,%eax,4),%eax tv2ns+0x2a: leal (%eax,%eax,4),%eax ; ; at this point, eax = 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * tvp->tv_sec, ; which is the same as: 1953125 * tvp->tv_sec ; tv2ns+0x2d: shll $0x9,%eax ; eax <<= 9 ; ; eax = (1953125 * tvp->tv_sec) << 9, ; which suprprisingly ends up being the same as: 1000000000 * tvp->tv_sec ; ; so, now we have 'eax' with the tv_sec converted to nanoseconds and 'edx' ; with 125 * tv_usec ; tv2ns+0x30: leal (%eax,%edx,8),%eax ; eax = eax + 8 * edx ; ; 8 * 125 = 1000, which is the factor to convert tv_usec to nanoseconds! ; tv2ns+0x33: cltd ; sign-extend eax to edx:eax tv2ns+0x34: ret
I found it interesting that GCC decided to emit leal instructions to multiply by 5 and then finish it off with a shift and another leal. This is another one of those times when I realize that the compiler is smarter than me. (The sign-extension of course happens too late — all the math needs to happen as 64-bit arithmetic, but that’s not GCC’s fault.)
For the record, with the #define changed, the function looks like the following — sorry, no comments on this one:
tv2ns: pushl %edi tv2ns+1: pushl %esi tv2ns+2: pushl %ebx tv2ns+3: subl $0x8,%esp tv2ns+6: movl 0x18(%esp),%ecx tv2ns+0xa: movl 0x4(%ecx),%eax tv2ns+0xd: leal (%eax,%eax,4),%eax tv2ns+0x10: leal (%eax,%eax,4),%eax tv2ns+0x13: leal (%eax,%eax,4),%ebx tv2ns+0x16: shll $0x3,%ebx tv2ns+0x19: movl %ebx,%esi tv2ns+0x1b: sarl $0x1f,%esi tv2ns+0x1e: movl $0x3b9aca00,%edi tv2ns+0x23: movl (%ecx),%eax tv2ns+0x25: imull %edi tv2ns+0x27: movl %eax,(%esp) tv2ns+0x2a: movl %edx,0x4(%esp) tv2ns+0x2e: addl %ebx,%eax tv2ns+0x30: adcl %esi,%edx tv2ns+0x32: addl $0x8,%esp tv2ns+0x35: popl %ebx tv2ns+0x36: popl %esi tv2ns+0x37: popl %edi tv2ns+0x38: ret
Maybe one day I’ll rummage through my brain and dig up other times that GCC is outsmarted me and blahg about them. :)
In case you missed it, back in mid-2011 I discovered Illumos and OpenIndiana. At that point, I already missed hacking on the (Linux) kernel. Based on my blahg posts [1,2], it shouldn’t surprise you that it didn’t take long before I wanted to hack on the Illumos kernel…and so I did.
If you ever contributed to an open source project in your free time while employed full-time, you understand that there’s only so much time you can devote to the open source project and therefore there is only so much you can do.
A couple of months ago, I decided to explore the possibility of working full-time on Illumos. There are only a handful of companies that visibly participate in the Illumos ecosystem, but their use of Illumos is pretty varied (from public clouds to virtualized databases to SAN/NAS appliances). As of this past Tuesday (Monday was a holiday), I’m at Nexenta. At least for now, I’m working remotely (from Ann Arbor) with the fine folks in the
Lowell office. It feels great to work on open source again.
Powered by blahgd