DTrace: The utmp_update Debugger
For the past 2 years, I am a happy user of rxvt-unicode, aka urxvt. Recently, I noticed that my logs contained rather mysterious error messages:
Nov 3 22:46:03 meili utmp_update[1613]: [ID 845426 user.error] Wrong number of arguments or invalid user
Sometimes, there were a dozen of these. Of course I filed a bug with the Illumos folks. Rich Lowe suggested using DTrace to figure out what is actually going on. It was time to look at the exit codes for utmp_update.
syscall::rexit:entry /execname=="utmp_update"/ { printf("utmp_update exited with code %d", arg0); @[arg0] = count(); } tick-60sec { printa(@); }
Since utmp is involved, it had something to do with terminals, so I tried to open some terminals and close them. That did it!
# dtrace -s catch-errors.d dtrace: script 'catch-errors.d' matched 2 probes CPU ID FUNCTION:NAME 0 49 rexit:entry utmp_update exited with code 0 1 49 rexit:entry utmp_update exited with code 0 0 49 rexit:entry utmp_update exited with code 7 0 49 rexit:entry utmp_update exited with code 7 1 49 rexit:entry utmp_update exited with code 0 5 49 rexit:entry utmp_update exited with code 0 1 49 rexit:entry utmp_update exited with code 0 2 49 rexit:entry utmp_update exited with code 0 3 49 rexit:entry utmp_update exited with code 7 1 67549 :tick-60sec 7 3 0 6
It turns out that every time I closed a terminal, utmp_update exited with error 7. A quick glance at usr/src/cmd/utmp_update/utmp_update.c reveals:
/* * Return codes */ #define NORMAL_EXIT 0 #define BAD_ARGS 1 #define PUTUTXLINE_FAILURE 2 #define FORK_FAILURE 3 #define SETSID_FAILURE 4 #define ALREADY_DEAD 5 #define ENTRY_NOTFOUND 6 #define ILLEGAL_ARGUMENT 7 #define DEVICE_ERROR 8
Aha! It really is an invalid argument. At this point, Rich pointed me to setutxline in libc.so. Sadly, for whatever reason, the probe pid*:libc.so.1:pututxline:entry didn’t work (it didn’t match anything). Rich suggested the following DTrace script:
proc:::exec /strstr(args[0], "utmp") != NULL/ { trace(execname); }
Pretty straightforward — the output told me that it was urxvt causing all this trouble.
Now, I knew to watch out for pututxline in urxvt. I tried to set a probe pid$target::pututxline:entry and use the new print function in DTrace, but due to a user error (read: sometimes I write stupid code) it didn’t work. Rich helped me navigate through mdb to get a print-out of the utx structure. At this point, it was a bit too late in the night and so I went to bed.
The next morning, I tried the print function again and this time I used it right and it printed out the structure:
struct utmpx { char [32] ut_user = [ "" ] char [4] ut_id = [ 'v', 't', '0', '2' ] char [32] ut_line = [ "pts/2" ] pid_t ut_pid = 0x193ec short ut_type = 0x8 struct exit_status ut_exit = { short e_termination = 0 short e_exit = 0 } struct timeval ut_tv = { time_t tv_sec = 0x4eb55d2a suseconds_t tv_usec = 0x18adc } int ut_session = 0 int [5] pad = [ 0, 0, 0, 0, 0 ] short ut_syslen = 0 char [257] ut_host = [ "" ] }
Everything looks right, except that the ut_user field is blank. I wonder if this could be the cause of it. Time to look at the urxvt code! (The ustack() action in a DTrace probe for pututxline:entry will tell you where to look.) Here’s a snippet from rxvt-unicode-9.12/libptytty/src/logging.C:
/* * remove utmp and wtmp entries */ void ptytty_unix::logout () { ... #ifdef HAVE_STRUCT_UTMPX setutxent (); tmputx = getutxid (utx); if (tmputx && tmputx->ut_pid == cmd_pid) pututxline (utx); endutxent (); #endif ... }
Ok, so it gets a utx struct and then it puts a different one. Let’s see how different those two are:
# cat getutxid.d #include <utmpx.h> pid$target::getutxid:return { ustack(); print(*(struct utmpx*)copyin(arg1, sizeof(struct utmpx))); } # dtrace -Cs getutxid.d -p 103403 dtrace: script 'getutxid.d' matched 1 probes dtrace: pid 103403 has exited CPU ID FUNCTION:NAME 4 67555 getutxid:return libc.so.1`getutxid+0xf1 urxvt`_ZN11ptytty_unixD0Ev+0x16 urxvt`_ZN9rxvt_termD1Ev+0x59b urxvt`_ZN9rxvt_term10destroy_cbERN2ev4idleEi+0x70 urxvt`_ZN2ev4baseI7ev_idleNS_4idleEE12method_thunkI9rxvt_termXadL_ZNS5_10destroy_cbERS2_iEEEEvPS1_i+0x27 urxvt`ev_invoke_pending+0x35 urxvt`ev_run+0x520 urxvt`main+0x29b urxvt`_start+0x83 struct utmpx { char [32] ut_user = [ "jeffpc" ] char [4] ut_id = [ 'v', 't', '0', '2' ] char [32] ut_line = [ "pts/2" ] pid_t ut_pid = 0x193ec short ut_type = 0x7 struct exit_status ut_exit = { short e_termination = 0 short e_exit = 0x2 } struct timeval ut_tv = { time_t tv_sec = 0x4eb55d1f suseconds_t tv_usec = 0x18adc } int ut_session = 0 int [5] pad = [ 0, 0, 0, 0x303a0005, 0x302e ] short ut_syslen = 0 char [257] ut_host = [ "" ] }
Ok, it’s more or less the same. It does, however, have a username filled in. I wonder what would happen if I filled in the username. (The urxvt code seems to fill it in only on login updates and it leaves the field empty on logout updates.) Now, I had 3 choices…
- Change the urxvt code to fill in the username on logout updates.
- Set a breakpoint in gdb or mdb and then tweak the structure before it is passed to utmp_update.
- Use DTraces “destructive” option to allow me to modify the process’s memory.
I chose #3.
Here’s the script in all its glory:
#pragma D option destructive #include <utmpx.h> pid$target::getutxid:return { ustack(); print(*(struct utmpx*)copyin(arg1, sizeof(struct utmpx))); } pid$target::pututxline:entry { ustack(); print(*(struct utmpx*)copyin(arg0, sizeof(struct utmpx))); printf("\nFIXING...\n"); copyout("jeffpc\0", (uintptr_t)&((struct utmpx*)arg0)->ut_user[0], 7); print(*(struct utmpx*)copyin(arg0, sizeof(struct utmpx))); } pid$target::pututxline:return { printf("pututxline returned %p", arg1); }
And here’s the output:
# dtrace -Cs foo.d -p 103403 dtrace: script 'foo.d' matched 3 probes dtrace: allowing destructive actions dtrace: pid 103403 has exited CPU ID FUNCTION:NAME 4 67555 getutxid:return libc.so.1`getutxid+0xf1 urxvt`_ZN11ptytty_unixD0Ev+0x16 urxvt`_ZN9rxvt_termD1Ev+0x59b urxvt`_ZN9rxvt_term10destroy_cbERN2ev4idleEi+0x70 urxvt`_ZN2ev4baseI7ev_idleNS_4idleEE12method_thunkI9rxvt_termXadL_ZNS5_10destroy_cbERS2_iEEEEvPS1_i+0x27 urxvt`ev_invoke_pending+0x35 urxvt`ev_run+0x520 urxvt`main+0x29b urxvt`_start+0x83 struct utmpx { char [32] ut_user = [ "jeffpc" ] char [4] ut_id = [ 'v', 't', '0', '2' ] char [32] ut_line = [ "pts/2" ] pid_t ut_pid = 0x193ec short ut_type = 0x7 struct exit_status ut_exit = { short e_termination = 0 short e_exit = 0x2 } struct timeval ut_tv = { time_t tv_sec = 0x4eb55d1f suseconds_t tv_usec = 0x18adc } int ut_session = 0 int [5] pad = [ 0, 0, 0, 0x303a0005, 0x302e ] short ut_syslen = 0 char [257] ut_host = [ "" ] } 4 67556 pututxline:entry libc.so.1`pututxline urxvt`_ZN11ptytty_unix6logoutEv+0x15c urxvt`_ZN11ptytty_unixD0Ev+0x16 urxvt`_ZN9rxvt_termD1Ev+0x59b urxvt`_ZN9rxvt_term10destroy_cbERN2ev4idleEi+0x70 urxvt`_ZN2ev4baseI7ev_idleNS_4idleEE12method_thunkI9rxvt_termXadL_ZNS5_10destroy_cbERS2_iEEEEvPS1_i+0x27 urxvt`ev_invoke_pending+0x35 urxvt`ev_run+0x520 urxvt`main+0x29b urxvt`_start+0x83 struct utmpx { char [32] ut_user = [ "" ] char [4] ut_id = [ 'v', 't', '0', '2' ] char [32] ut_line = [ "pts/2" ] pid_t ut_pid = 0x193ec short ut_type = 0x8 struct exit_status ut_exit = { short e_termination = 0 short e_exit = 0 } struct timeval ut_tv = { time_t tv_sec = 0x4eb55d2a suseconds_t tv_usec = 0x18adc } int ut_session = 0 int [5] pad = [ 0, 0, 0, 0, 0 ] short ut_syslen = 0 char [257] ut_host = [ "" ] } FIXING... struct utmpx { char [32] ut_user = [ "jeffpc" ] char [4] ut_id = [ 'v', 't', '0', '2' ] char [32] ut_line = [ "pts/2" ] pid_t ut_pid = 0x193ec short ut_type = 0x8 struct exit_status ut_exit = { short e_termination = 0 short e_exit = 0 } struct timeval ut_tv = { time_t tv_sec = 0x4eb55d2a suseconds_t tv_usec = 0x18adc } int ut_session = 0 int [5] pad = [ 0, 0, 0, 0, 0 ] short ut_syslen = 0 char [257] ut_host = [ "" ] } 4 67557 pututxline:return pututxline returned fef6ecf0
We can see the getutxid return a reasonable utmpx. Then we see pututxline get a utmpx without a username set. Then there is the fixed up tmpx. Finally, we see that the pututxline returned a non-NULL pointer. (It returns NULL on error — which does indeed happen without the fix-up.)
There you have it, folks. DTrace let me debug an issue without annoying change-compile-install cycles. Now, all I have to do is fix up urxvt in OpenIndiana and possibly, if it applies to other systems, push the fix upstream.