Josef “Jeff” Sipek

Dumping Memory in MDB

It doesn’t take much reading of documentation, other people’s blogs, and other random web search results to learn how to dump a piece of memory in mdb.

In the following examples, I’ll use the address fffffffffbc30a70. This just so happens to be an avl_tree_t on my system. We can use the ::dump command:

> fffffffffbc30a70::dump
                   \/ 1 2 3  4 5 6 7  8 9 a b  c d e f  v123456789abcdef
fffffffffbc30a70:  801dc3fb ffffffff 0087b1fb ffffffff  ................

Or we can use the adb-style /B command:

> fffffffffbc30a70/B
kas+0x50:       80      

We can even specify the amount of data we want to dump. ::dump takes how many bytes to dump, while /B takes how many 1-byte integers to dump (while for example, /X takes how many 4-byte integers to dump):

> fffffffffbc30a70,20::dump
                   \/ 1 2 3  4 5 6 7  8 9 a b  c d e f  v123456789abcdef
fffffffffbc30a70:  801dc3fb ffffffff 0087b1fb ffffffff  ................
fffffffffbc30a80:  20000000 00000000 09000000 00000000   ...............
> fffffffffbc30a70,20/B
kas+0x50:       80      1d      c3      fb      ff      ff      ff      ff      0       87      b1      
                fb      ff      ff      ff      ff      20      0       0       0       0       0       
                0       0       9       0       0       0       0       0       0       0       

Things break down if we want to use a walker and pipe the output to ::dump or /B:

> fffffffffbc30a70::walk avl | ::dump
                   \/ 1 2 3  4 5 6 7  8 9 a b  c d e f  v123456789abcdef
fffffffffbc6d2e0:  00000000 00feffff 0000001e 03000000  ................
> fffffffffbc30a70::walk avl | /B
kpmseg:
kpmseg:         0       0       0       0       0       0       0       0       0       

Even though there are 9 entries in the AVL tree, ::dump dumps only the first one. /B does a bit better and it does print what appears to be the first byte of each. What if we want to dump more than just the first byte? Say, the first 32? ::dump is of no use already. Let’s see what we can make /B do:

> fffffffffbc30a70::walk avl | 20/B
mdb: syntax error near "20"
> fffffffffbc30a70::walk avl | ,20/B
mdb: syntax error near ","

No luck.

Solution

Ok, it’s time for the trick that makes it all work. You have to use the ::eval function. For example:

> fffffffffbc30a70::walk avl | ::eval .,20::dump
                   \/ 1 2 3  4 5 6 7  8 9 a b  c d e f  v123456789abcdef
fffffffffbc6d2e0:  00000000 00feffff 0000001e 03000000  ................
fffffffffbc6d2f0:  00000000 00000000 200ac3fb ffffffff  ........ .......
                   \/ 1 2 3  4 5 6 7  8 9 a b  c d e f  v123456789abcdef
fffffffffbc34960:  00000000 00ffffff 00000017 00000000  ................
fffffffffbc34970:  00000000 00000000 200ac3fb ffffffff  ........ .......
                   \/ 1 2 3  4 5 6 7  8 9 a b  c d e f  v123456789abcdef
fffffffffbc31ce0:  00000017 00ffffff 00000080 00000000  ................
fffffffffbc31cf0:  00000000 00000000 200ac3fb ffffffff  ........ .......
                   \/ 1 2 3  4 5 6 7  8 9 a b  c d e f  v123456789abcdef
fffffffffbc35a80:  00000097 00ffffff 0000a0fc 02000000  ................
fffffffffbc35a90:  00000000 00000000 200ac3fb ffffffff  ........ .......
                   \/ 1 2 3  4 5 6 7  8 9 a b  c d e f  v123456789abcdef
fffffffffbc34880:  0000a0d3 03ffffff 00000004 00000000  ................
fffffffffbc34890:  00000000 00000000 200ac3fb ffffffff  ........ .......
                   \/ 1 2 3  4 5 6 7  8 9 a b  c d e f  v123456789abcdef
fffffffffbc31d60:  0000a0d7 03ffffff 000060e8 fb000000  ..........`.....
fffffffffbc31d70:  00000000 00000000 200ac3fb ffffffff  ........ .......
                   \/ 1 2 3  4 5 6 7  8 9 a b  c d e f  v123456789abcdef
fffffffffbc7f3a0:  000000c0 ffffffff 00b07f3b 00000000  ...........;....
fffffffffbc7f3b0:  00000000 00000000 200ac3fb ffffffff  ........ .......
                   \/ 1 2 3  4 5 6 7  8 9 a b  c d e f  v123456789abcdef
fffffffffbc7de60:  000080fb ffffffff 00105500 00000000  ..........U.....
fffffffffbc7de70:  00000000 00000000 200ac3fb ffffffff  ........ .......
                   \/ 1 2 3  4 5 6 7  8 9 a b  c d e f  v123456789abcdef
fffffffffbc7e000:  000080ff ffffffff 00004000 00000000  ..........@.....
fffffffffbc7e010:  00000000 00000000 200ac3fb ffffffff  ........ .......

Perfect! ::eval makes repetition with /B work as well:

> fffffffffbc30a70::walk avl | ::eval .,8/B
kpmseg:
kpmseg:         0       0       0       0       0       fe      ff      ff
kvalloc:
kvalloc:        0       0       0       0       0       ff      ff      ff
kpseg:
kpseg:          0       0       0       17      0       ff      ff      ff
kzioseg:
kzioseg:        0       0       0       97      0       ff      ff      ff
kmapseg:
kmapseg:        0       0       a0      d3      3       ff      ff      ff
kvseg:
kvseg:          0       0       a0      d7      3       ff      ff      ff
kvseg_core:
kvseg_core:     0       0       0       c0      ff      ff      ff      ff
ktextseg:
ktextseg:       0       0       80      fb      ff      ff      ff      ff
kdebugseg:
kdebugseg:      0       0       80      ff      ff      ff      ff      ff

/nap

There is one more trick I want to share in this post. Suppose you have a mostly useless core file, and you want to dump the stack. Not as hex, but rather as a symbol + offset (if possible). The magic command you want is /nap. ‘/’ for printing, ‘n’ for a newline, ‘a’ for symbol + offset (of the value at “dot”), and ‘p’ for symbol (or address) of “dot”. (Formatting differences aside, ‘p’ prints the pointer—“dot”, and ‘a’ prints the value being pointed to—*“dot”.)

For example:

> fd94e3a8,8/nap
0xfd94e3a8:     
0xfd94e3a8:     0xfd94f5a8      
0xfd94e3ac:     libzfs.so.1`namespace_reload+0x394
0xfd94e3b0:     0xfdd6ce28      
0xfd94e3b4:     0xfdd6a423      
0xfd94e3b8:     0xcc            
0xfd94e3bc:     libzfs.so.1`__func__.16928
0xfd94e3c0:     0xfdd6ce00      
0xfd94e3c4:     0xfdd6ce28      

Since the memory happens to be part of the stack, there are no symbols associated with it and therefore the ‘p’ prints a raw hex value.

So, remember: if you have a core file and you think that you need to dump the stack to scavenge for hopefully useful values, you want to…nap. :)

Debugging with mdb

Recently, Theo Schlossnagle posted two interesting articles about debugging on Illumos using mdb. They are MDB, CTF, DWARF, and other angelic things, and mdb custom dmods.

Rebooter

I briefly mentioned that I was debugging a boot hang. Since the hang does not happen every time I try to boot, it may take a couple of reboots to get the kernel to hang. Doing this manually is tedious. Thankfully it can be scripted. Therefore, I made a simple script and a SMF manifest that runs the script at the end of boot. If the system boots fine, my script reboots it. If the system hangs mid-boot, well my script never executes leaving the system in a hung state. Then, I can break into the kernel debugger (mdb) and investigate.

I’m sharing the two here mostly for my benefit… in case one day in the future I decide that I need my system automatically rebooted over and over again.

The script is pretty simple. Hopefully, 60 seconds is long enough to log in and disable the service if necessary. (In reality, I setup a separate boot environment that’s the default choice in Grub. I can just select my normal boot environment and get back to non-timebomb system.)

#!/bin/sh

sleep 60

reboot -p

The tricky part is of course in the manifest. Not because it is hard, but because XML is … verbose.

<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<service_bundle type='manifest' name='rebooter'>
	<service name='site/rebooter' type='service' version='1'>
		<dependency name='booted'
		    grouping='require_all'
		    restart_on='none'
		    type='service'>
			<service_fmri
			    value='svc:/milestone/multi-user-server:default'/>
		</dependency>

		<property_group name="startd" type="framework">
			<propval name="duration" type="astring" value="child"/>
			<propval name="ignore_error" type="astring"
				value="core,signal"/>
		</property_group>

		<instance name='system' enabled='true'>
			<exec_method
				type='method'
				name='start'
				exec='/home/jeffpc/illumos/rebooter/script.sh'
				timeout_seconds='0' />

			<exec_method
				type='method'
				name='stop'
				exec=':true'
				timeout_seconds='0' />
		</instance>

		<stability value='Unstable' />
	</service>
</service_bundle>

That’s all, carry on what you were doing. :)

CPU Pause Threads

Recently, I ended up debugging a boot hang. (I’m still working on it, so I don’t have a resolution to it yet.) The hang seems to occur during the mp startup. That is, when the boot CPU tries to online all the other CPUs on the system. As a result, I spent a fair amount of time reading the code and poking around with mdb. Given the effort I put in, I decided to document my understanding of how CPUs get brought online during boot in Illumos. In this post, I’ll talk about the CPU pause threads.

Each CPU has a special thread — the pause thread. It is a very high priority thread that’s supposed to preempt everything on the CPU. If all CPUs are executing this high-priority thread, then we know for fact that nothing can possibly be dereferencing the CPU structures’ (cpu_t) pointers. Why is this useful? Here’s a comment from right above cpu_pause — the function pause threads execute:

/*
 * This routine is called to place the CPUs in a safe place so that
 * one of them can be taken off line or placed on line.  What we are
 * trying to do here is prevent a thread from traversing the list
 * of active CPUs while we are changing it or from getting placed on
 * the run queue of a CPU that has just gone off line.  We do this by
 * creating a thread with the highest possible prio for each CPU and
 * having it call this routine.  The advantage of this method is that
 * we can eliminate all checks for CPU_ACTIVE in the disp routines.
 * This makes disp faster at the expense of making p_online() slower
 * which is a good trade off.
 */

The pause thread is pointed to by the CPU structure’s cpu_pause_thread member. A new CPU does not have a pause thread until after it has been added to the list of existing CPUs. (cpu_pause_alloc does the actual allocation.)

CPU pausing is pretty strange. First of all, let’s call the CPU requesting other CPUs to pause the controlling CPU and all online CPUs that will pause the pausing CPUs. (The controlling CPU does not pause itself.) Second, there are two global structures: (1) a global array called safe_list which contains a 8-bit integer for each possible CPU where each element holds a value ranging from 0 to 4 (PAUSE_*) denoting the state of that CPU’s pause thread, and (2) cpu_pause_info which contains some additional goodies used for synchronization.

Pausing

To pause CPUs, the controlling CPU calls pause_cpus (which uses cpu_pause_start), where it iterates over all the pausing CPUs setting their safe_list entries to PAUSE_IDLE and queueing up (using setbackdq) their pause threads.

Now, just because the pause threads got queued doesn’t mean that they’ll get to execute immediately. That is why the controlling CPU then waits for each of the pause threads to up a semaphore in the cpu_pause_info structure. Once all the pause threads have upped the semaphore, the controlling CPU sets the cp_go flag to let the pause threads know that it’s time for them to go to sleep. Then the controlling CPU waits for each pause thread to signal (via the safe_list) that they have disabled just about all interrupts and that they are spinning (mach_cpu_pause). At this point, pause_cpus knows that all online CPUs are in a safe place.

Starting

Starting the CPUs back up is pretty easy. The controlling CPU just needs to set all the CPU’s safe_list to a PAUSE_IDLE. That will cause the pausing CPUs to break out of their spin-loop. Once out of the spin loop, interrupts are re-enabled and a CPU control relinquished (via swtch). The controlling CPU does some cleanup of its own, but that’s all that is to it.

Synchronization

Why not use a mutex or semaphore for everything? The problem lies in the fact that we are in a really fragile state. We don’t want to lose the CPU because we blocked on a semaphore. That’s why this code uses a custom synchronization primitives.

iSCSI boot

I decided a couple of days ago to try to see if OpenIndiana would still fail to boot from iSCSI like it did about two years ago. This post exists to remind me later what I did. If you find it helpful, great.

First, I got to set up the target. There is a bunch of documentation how to use COMSTAR to export a LU, so I won’t explain. I made a 100 GB LU.

I dug up an older system to act as my test box and disconnected its SATA disk. Booting from the OI USB image was uneventful. Before starting the installer, dropped into a shell and connected to the target (using iscsiadm). Then I installed OI onto the LU. Then, I dropped back into the shell to modify Grub’s menu.lst to use the serial port for both the Grub menu as well as make the kernel direct console output there.

Since the two on-board NICs can’t boot off iSCSI, I ended up using iPXE to boot off iSCSI. First, I made a script file:

#!ipxe

dhcp
sanboot iscsi:172.16.0.1:::0:iqn.2010-08.org.illumos:02:oi-test

Then it was time to grab the source and build it. I did run into a simple problem in a test file, so I patched it trivially.

$ git clone git://git.ipxe.org/ipxe.git
$ cd ipxe
$ cat /tmp/ipxe.patch
diff --git a/src/tests/vsprintf_test.c b/src/tests/vsprintf_test.c
index 11512ec..2231574 100644
--- a/src/tests/vsprintf_test.c
+++ b/src/tests/vsprintf_test.c
@@ -66,7 +66,7 @@ static void vsprintf_test_exec ( void ) {
 	/* Basic format specifiers */
 	snprintf_ok ( 16, "%", "%%" );
 	snprintf_ok ( 16, "ABC", "%c%c%c", 'A', 'B', 'C' );
-	snprintf_ok ( 16, "abc", "%lc%lc%lc", L'a', L'b', L'c' );
+	//snprintf_ok ( 16, "abc", "%lc%lc%lc", L'a', L'b', L'c' );
 	snprintf_ok ( 16, "Hello world", "%s %s", "Hello", "world" );
 	snprintf_ok ( 16, "Goodbye world", "%ls %s", L"Goodbye", "world" );
 	snprintf_ok ( 16, "0x1234abcd", "%p", ( ( void * ) 0x1234abcd ) );
$ patch -p1 < /tmp/ipxe.patch
$ make bin/ipxe.usb EMBED=/tmp/ipxe.script
$ sudo dd if=bin/ipxe.usb of=/dev/rdsk/c8t0d0p0 bs=1M

Now, I had a USB flash drive with iPXE that’d get a DHCP lease and then proceed to boot from my iSCSI target.

Did the system boot? Partially. iPXE did everything right — DHCP, storing the iSCSI information in the Wikipedia article: iBFT, reading from the LU and handing control over to Grub. Grub did the right thing too. Sadly, once within kernel, things didn’t quite work out the way they should.

iBFT

Was the iBFT getting parsed properly? After reading the code for a while and using mdb to examine the state, I found a convenient tunable (read: global int that can be set using the debugger) that will cause the iSCSI boot parameters to be dumped to the console. It is called iscsi_print_bootprop. Setting it to non-zero will produce nice output:

Welcome to kmdb
kmdb: unable to determine terminal type: assuming `vt100'
Loaded modules: [ unix krtld genunix ]
[0]> iscsi_print_bootprop/W 1
iscsi_print_bootprop:           0               =       0x1
[0]> :c
OpenIndiana Build oi_151a7 64-bit (illumos 13815:61cf2631639d)
SunOS Release 5.11 - Copyright 1983-2010 Oracle and/or its affiliates.
All rights reserved. Use is subject to license terms.
Initiator Name : iqn.2010-04.org.ipxe:00020003-0004-0005-0006-000700080009
Local IP addr  : 172.16.0.179
Local gateway  : 172.16.0.1
Local DHCP     : 0.0.0.0
Local MAC      : 00:02:b3:a8:66:0c
Target Name    : iqn.2010-08.org.illumos:02:oi-test
Target IP      : 172.16.0.1
Target Port    : 3260
Boot LUN       : 0000-0000-0000-0000

nge vs. e1000g

So, the iBFT was getting parsed properly. The only “error” message to indicate that something was wrong was the “Cannot plumb network device 19”. Searching the code reveals that this is in the rootconf function. After more tracing, it became apparent that the kernel was trying to set up the NIC but was failing to find a device with the MAC address iBFT indicated. (19 is ENODEV)

At this point, it dawned on me that the on-board NICs are mere nge devices. I popped in a PCI-X e1000g moved the cable over and rebooted. Things got a lot farther!

unable to connect

Currently, I’m looking at this output.

NOTICE: Configuring iSCSI boot session...
NOTICE: iscsi connection(5) unable to connect to target iqn.2010-08.org.illumos:02:oi-test
Loading smf(5) service descriptions: 171/171
Hostname: oi-test
Configuring devices.
Loading smf(5) service descriptions: 6/6
NOTICE: iscsi connection(12) unable to connect to target iqn.2010-08.org.illumos:02:oi-test

The odd thing is, while these appear SMF is busy loading manifests and tracing the iSCSI traffic to the target shows that the kernel is doing a bunch of reads and writes. I suspect that all the successful I/O was done over one connection and then something happens and we lose the link. This is where I am now.

DTrace: The utmp_update Debugger

For the past 2 years, I am a happy user of rxvt-unicode, aka urxvt. Recently, I noticed that my logs contained rather mysterious error messages:

Nov  3 22:46:03 meili utmp_update[1613]: [ID 845426 user.error] Wrong number of
	arguments or invalid user 

Sometimes, there were a dozen of these. Of course I filed a bug with the Illumos folks. Rich Lowe suggested using DTrace to figure out what is actually going on. It was time to look at the exit codes for utmp_update.

syscall::rexit:entry
/execname=="utmp_update"/
{
	printf("utmp_update exited with code %d", arg0);
	@[arg0] = count();
}

tick-60sec
{
	printa(@);
}

Since utmp is involved, it had something to do with terminals, so I tried to open some terminals and close them. That did it!

# dtrace -s catch-errors.d 
dtrace: script 'catch-errors.d' matched 2 probes
CPU     ID                    FUNCTION:NAME
  0     49                      rexit:entry utmp_update exited with code 0
  1     49                      rexit:entry utmp_update exited with code 0
  0     49                      rexit:entry utmp_update exited with code 7
  0     49                      rexit:entry utmp_update exited with code 7
  1     49                      rexit:entry utmp_update exited with code 0
  5     49                      rexit:entry utmp_update exited with code 0
  1     49                      rexit:entry utmp_update exited with code 0
  2     49                      rexit:entry utmp_update exited with code 0
  3     49                      rexit:entry utmp_update exited with code 7
  1  67549                      :tick-60sec 
                7                3
                0                6

It turns out that every time I closed a terminal, utmp_update exited with error 7. A quick glance at usr/src/cmd/utmp_update/utmp_update.c reveals:

/*
 * Return codes
 */
#define	NORMAL_EXIT		0
#define	BAD_ARGS		1
#define	PUTUTXLINE_FAILURE	2
#define	FORK_FAILURE		3
#define	SETSID_FAILURE		4
#define	ALREADY_DEAD		5
#define	ENTRY_NOTFOUND		6
#define	ILLEGAL_ARGUMENT	7
#define	DEVICE_ERROR		8

Aha! It really is an invalid argument. At this point, Rich pointed me to setutxline in libc.so. Sadly, for whatever reason, the probe pid*:libc.so.1:pututxline:entry didn’t work (it didn’t match anything). Rich suggested the following DTrace script:

proc:::exec
/strstr(args[0], "utmp") != NULL/
{
	trace(execname);
}

Pretty straightforward — the output told me that it was urxvt causing all this trouble.

Now, I knew to watch out for pututxline in urxvt. I tried to set a probe pid$target::pututxline:entry and use the new print function in DTrace, but due to a user error (read: sometimes I write stupid code) it didn’t work. Rich helped me navigate through mdb to get a print-out of the utx structure. At this point, it was a bit too late in the night and so I went to bed.

The next morning, I tried the print function again and this time I used it right and it printed out the structure:

struct utmpx {
    char [32] ut_user = [ "" ]
    char [4] ut_id = [ 'v', 't', '0', '2' ]
    char [32] ut_line = [ "pts/2" ]
    pid_t ut_pid = 0x193ec
    short ut_type = 0x8
    struct exit_status ut_exit = {
        short e_termination = 0
        short e_exit = 0
    }
    struct timeval ut_tv = {
        time_t tv_sec = 0x4eb55d2a
        suseconds_t tv_usec = 0x18adc
    }
    int ut_session = 0
    int [5] pad = [ 0, 0, 0, 0, 0 ]
    short ut_syslen = 0
    char [257] ut_host = [ "" ]
}

Everything looks right, except that the ut_user field is blank. I wonder if this could be the cause of it. Time to look at the urxvt code! (The ustack() action in a DTrace probe for pututxline:entry will tell you where to look.) Here’s a snippet from rxvt-unicode-9.12/libptytty/src/logging.C:

/*
 * remove utmp and wtmp entries
 */
void
ptytty_unix::logout ()
{
  ...
#ifdef HAVE_STRUCT_UTMPX
  setutxent ();
  tmputx = getutxid (utx);
  if (tmputx && tmputx->ut_pid == cmd_pid)
    pututxline (utx);
  endutxent ();
#endif
  ...
}

Ok, so it gets a utx struct and then it puts a different one. Let’s see how different those two are:

# cat getutxid.d
#include <utmpx.h>

pid$target::getutxid:return
{
	ustack();
	print(*(struct utmpx*)copyin(arg1, sizeof(struct utmpx)));
}
# dtrace -Cs getutxid.d -p 103403
dtrace: script 'getutxid.d' matched 1 probes
dtrace: pid 103403 has exited
CPU     ID                    FUNCTION:NAME
  4  67555                  getutxid:return 
              libc.so.1`getutxid+0xf1
              urxvt`_ZN11ptytty_unixD0Ev+0x16
              urxvt`_ZN9rxvt_termD1Ev+0x59b
              urxvt`_ZN9rxvt_term10destroy_cbERN2ev4idleEi+0x70
              urxvt`_ZN2ev4baseI7ev_idleNS_4idleEE12method_thunkI9rxvt_termXadL_ZNS5_10destroy_cbERS2_iEEEEvPS1_i+0x27
              urxvt`ev_invoke_pending+0x35
              urxvt`ev_run+0x520
              urxvt`main+0x29b
              urxvt`_start+0x83
struct utmpx {
    char [32] ut_user = [ "jeffpc" ]
    char [4] ut_id = [ 'v', 't', '0', '2' ]
    char [32] ut_line = [ "pts/2" ]
    pid_t ut_pid = 0x193ec
    short ut_type = 0x7
    struct exit_status ut_exit = {
        short e_termination = 0
        short e_exit = 0x2
    }
    struct timeval ut_tv = {
        time_t tv_sec = 0x4eb55d1f
        suseconds_t tv_usec = 0x18adc
    }
    int ut_session = 0
    int [5] pad = [ 0, 0, 0, 0x303a0005, 0x302e ]
    short ut_syslen = 0
    char [257] ut_host = [ "" ]
}

Ok, it’s more or less the same. It does, however, have a username filled in. I wonder what would happen if I filled in the username. (The urxvt code seems to fill it in only on login updates and it leaves the field empty on logout updates.) Now, I had 3 choices…

  1. Change the urxvt code to fill in the username on logout updates.
  2. Set a breakpoint in gdb or mdb and then tweak the structure before it is passed to utmp_update.
  3. Use DTraces “destructive” option to allow me to modify the process’s memory.

I chose #3.

Here’s the script in all its glory:

#pragma D option destructive

#include <utmpx.h>

pid$target::getutxid:return
{
	ustack();
	print(*(struct utmpx*)copyin(arg1, sizeof(struct utmpx)));
}

pid$target::pututxline:entry
{
	ustack();
	print(*(struct utmpx*)copyin(arg0, sizeof(struct utmpx)));
	printf("\nFIXING...\n");
	copyout("jeffpc\0", (uintptr_t)&((struct utmpx*)arg0)->ut_user[0], 7);
	print(*(struct utmpx*)copyin(arg0, sizeof(struct utmpx)));
}

pid$target::pututxline:return
{
	printf("pututxline returned %p", arg1);
}

And here’s the output:

# dtrace -Cs foo.d -p 103403
dtrace: script 'foo.d' matched 3 probes
dtrace: allowing destructive actions
dtrace: pid 103403 has exited
CPU     ID                    FUNCTION:NAME
  4  67555                  getutxid:return 
              libc.so.1`getutxid+0xf1
              urxvt`_ZN11ptytty_unixD0Ev+0x16
              urxvt`_ZN9rxvt_termD1Ev+0x59b
              urxvt`_ZN9rxvt_term10destroy_cbERN2ev4idleEi+0x70
              urxvt`_ZN2ev4baseI7ev_idleNS_4idleEE12method_thunkI9rxvt_termXadL_ZNS5_10destroy_cbERS2_iEEEEvPS1_i+0x27
              urxvt`ev_invoke_pending+0x35
              urxvt`ev_run+0x520
              urxvt`main+0x29b
              urxvt`_start+0x83
struct utmpx {
    char [32] ut_user = [ "jeffpc" ]
    char [4] ut_id = [ 'v', 't', '0', '2' ]
    char [32] ut_line = [ "pts/2" ]
    pid_t ut_pid = 0x193ec
    short ut_type = 0x7
    struct exit_status ut_exit = {
        short e_termination = 0
        short e_exit = 0x2
    }
    struct timeval ut_tv = {
        time_t tv_sec = 0x4eb55d1f
        suseconds_t tv_usec = 0x18adc
    }
    int ut_session = 0
    int [5] pad = [ 0, 0, 0, 0x303a0005, 0x302e ]
    short ut_syslen = 0
    char [257] ut_host = [ "" ]
}
  4  67556                 pututxline:entry 
              libc.so.1`pututxline
              urxvt`_ZN11ptytty_unix6logoutEv+0x15c
              urxvt`_ZN11ptytty_unixD0Ev+0x16
              urxvt`_ZN9rxvt_termD1Ev+0x59b
              urxvt`_ZN9rxvt_term10destroy_cbERN2ev4idleEi+0x70
              urxvt`_ZN2ev4baseI7ev_idleNS_4idleEE12method_thunkI9rxvt_termXadL_ZNS5_10destroy_cbERS2_iEEEEvPS1_i+0x27
              urxvt`ev_invoke_pending+0x35
              urxvt`ev_run+0x520
              urxvt`main+0x29b
              urxvt`_start+0x83
struct utmpx {
    char [32] ut_user = [ "" ]
    char [4] ut_id = [ 'v', 't', '0', '2' ]
    char [32] ut_line = [ "pts/2" ]
    pid_t ut_pid = 0x193ec
    short ut_type = 0x8
    struct exit_status ut_exit = {
        short e_termination = 0
        short e_exit = 0
    }
    struct timeval ut_tv = {
        time_t tv_sec = 0x4eb55d2a
        suseconds_t tv_usec = 0x18adc
    }
    int ut_session = 0
    int [5] pad = [ 0, 0, 0, 0, 0 ]
    short ut_syslen = 0
    char [257] ut_host = [ "" ]
}
FIXING...
struct utmpx {
    char [32] ut_user = [ "jeffpc" ]
    char [4] ut_id = [ 'v', 't', '0', '2' ]
    char [32] ut_line = [ "pts/2" ]
    pid_t ut_pid = 0x193ec
    short ut_type = 0x8
    struct exit_status ut_exit = {
        short e_termination = 0
        short e_exit = 0
    }
    struct timeval ut_tv = {
        time_t tv_sec = 0x4eb55d2a
        suseconds_t tv_usec = 0x18adc
    }
    int ut_session = 0
    int [5] pad = [ 0, 0, 0, 0, 0 ]
    short ut_syslen = 0
    char [257] ut_host = [ "" ]
}
  4  67557                pututxline:return pututxline returned fef6ecf0

We can see the getutxid return a reasonable utmpx. Then we see pututxline get a utmpx without a username set. Then there is the fixed up tmpx. Finally, we see that the pututxline returned a non-NULL pointer. (It returns NULL on error — which does indeed happen without the fix-up.)

There you have it, folks. DTrace let me debug an issue without annoying change-compile-install cycles. Now, all I have to do is fix up urxvt in OpenIndiana and possibly, if it applies to other systems, push the fix upstream.

Powered by blahgd