Josef “Jeff” Sipek

Grub Composite Console

In the past, I’ve described how to get a serial console going on Illumos based systems. If you ever used a serial console in Grub (regardless of the OS you ended up booting), you probably know that telling Grub to output to a serial port causes the VGA console to become totally useless — it’s blank.

Well, if you are using Illumos, you are in luck. About 5 months ago, Joyent integrated a “composite console” in Grub. You can read the full description in the bug report/feature request. The short version is: all grub output can be sent to both the VGA console as well as over a serial port.

It is very easy to configure. In your menu.lst, change the terminal to composite. For example, this comes from my test box’s config file (omitting the uninteresting bits):

serial --unit=0 --speed=115200
terminal composite

Note the use of composite instead of serial. That’s all there is to it.

Rebooter

I briefly mentioned that I was debugging a boot hang. Since the hang does not happen every time I try to boot, it may take a couple of reboots to get the kernel to hang. Doing this manually is tedious. Thankfully it can be scripted. Therefore, I made a simple script and a SMF manifest that runs the script at the end of boot. If the system boots fine, my script reboots it. If the system hangs mid-boot, well my script never executes leaving the system in a hung state. Then, I can break into the kernel debugger (mdb) and investigate.

I’m sharing the two here mostly for my benefit… in case one day in the future I decide that I need my system automatically rebooted over and over again.

The script is pretty simple. Hopefully, 60 seconds is long enough to log in and disable the service if necessary. (In reality, I setup a separate boot environment that’s the default choice in Grub. I can just select my normal boot environment and get back to non-timebomb system.)

#!/bin/sh

sleep 60

reboot -p

The tricky part is of course in the manifest. Not because it is hard, but because XML is … verbose.

<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<service_bundle type='manifest' name='rebooter'>
	<service name='site/rebooter' type='service' version='1'>
		<dependency name='booted'
		    grouping='require_all'
		    restart_on='none'
		    type='service'>
			<service_fmri
			    value='svc:/milestone/multi-user-server:default'/>
		</dependency>

		<property_group name="startd" type="framework">
			<propval name="duration" type="astring" value="child"/>
			<propval name="ignore_error" type="astring"
				value="core,signal"/>
		</property_group>

		<instance name='system' enabled='true'>
			<exec_method
				type='method'
				name='start'
				exec='/home/jeffpc/illumos/rebooter/script.sh'
				timeout_seconds='0' />

			<exec_method
				type='method'
				name='stop'
				exec=':true'
				timeout_seconds='0' />
		</instance>

		<stability value='Unstable' />
	</service>
</service_bundle>

That’s all, carry on what you were doing. :)

iSCSI boot

I decided a couple of days ago to try to see if OpenIndiana would still fail to boot from iSCSI like it did about two years ago. This post exists to remind me later what I did. If you find it helpful, great.

First, I got to set up the target. There is a bunch of documentation how to use COMSTAR to export a LU, so I won’t explain. I made a 100 GB LU.

I dug up an older system to act as my test box and disconnected its SATA disk. Booting from the OI USB image was uneventful. Before starting the installer, dropped into a shell and connected to the target (using iscsiadm). Then I installed OI onto the LU. Then, I dropped back into the shell to modify Grub’s menu.lst to use the serial port for both the Grub menu as well as make the kernel direct console output there.

Since the two on-board NICs can’t boot off iSCSI, I ended up using iPXE to boot off iSCSI. First, I made a script file:

#!ipxe

dhcp
sanboot iscsi:172.16.0.1:::0:iqn.2010-08.org.illumos:02:oi-test

Then it was time to grab the source and build it. I did run into a simple problem in a test file, so I patched it trivially.

$ git clone git://git.ipxe.org/ipxe.git
$ cd ipxe
$ cat /tmp/ipxe.patch
diff --git a/src/tests/vsprintf_test.c b/src/tests/vsprintf_test.c
index 11512ec..2231574 100644
--- a/src/tests/vsprintf_test.c
+++ b/src/tests/vsprintf_test.c
@@ -66,7 +66,7 @@ static void vsprintf_test_exec ( void ) {
 	/* Basic format specifiers */
 	snprintf_ok ( 16, "%", "%%" );
 	snprintf_ok ( 16, "ABC", "%c%c%c", 'A', 'B', 'C' );
-	snprintf_ok ( 16, "abc", "%lc%lc%lc", L'a', L'b', L'c' );
+	//snprintf_ok ( 16, "abc", "%lc%lc%lc", L'a', L'b', L'c' );
 	snprintf_ok ( 16, "Hello world", "%s %s", "Hello", "world" );
 	snprintf_ok ( 16, "Goodbye world", "%ls %s", L"Goodbye", "world" );
 	snprintf_ok ( 16, "0x1234abcd", "%p", ( ( void * ) 0x1234abcd ) );
$ patch -p1 < /tmp/ipxe.patch
$ make bin/ipxe.usb EMBED=/tmp/ipxe.script
$ sudo dd if=bin/ipxe.usb of=/dev/rdsk/c8t0d0p0 bs=1M

Now, I had a USB flash drive with iPXE that’d get a DHCP lease and then proceed to boot from my iSCSI target.

Did the system boot? Partially. iPXE did everything right — DHCP, storing the iSCSI information in the Wikipedia article: iBFT, reading from the LU and handing control over to Grub. Grub did the right thing too. Sadly, once within kernel, things didn’t quite work out the way they should.

iBFT

Was the iBFT getting parsed properly? After reading the code for a while and using mdb to examine the state, I found a convenient tunable (read: global int that can be set using the debugger) that will cause the iSCSI boot parameters to be dumped to the console. It is called iscsi_print_bootprop. Setting it to non-zero will produce nice output:

Welcome to kmdb
kmdb: unable to determine terminal type: assuming `vt100'
Loaded modules: [ unix krtld genunix ]
[0]> iscsi_print_bootprop/W 1
iscsi_print_bootprop:           0               =       0x1
[0]> :c
OpenIndiana Build oi_151a7 64-bit (illumos 13815:61cf2631639d)
SunOS Release 5.11 - Copyright 1983-2010 Oracle and/or its affiliates.
All rights reserved. Use is subject to license terms.
Initiator Name : iqn.2010-04.org.ipxe:00020003-0004-0005-0006-000700080009
Local IP addr  : 172.16.0.179
Local gateway  : 172.16.0.1
Local DHCP     : 0.0.0.0
Local MAC      : 00:02:b3:a8:66:0c
Target Name    : iqn.2010-08.org.illumos:02:oi-test
Target IP      : 172.16.0.1
Target Port    : 3260
Boot LUN       : 0000-0000-0000-0000

nge vs. e1000g

So, the iBFT was getting parsed properly. The only “error” message to indicate that something was wrong was the “Cannot plumb network device 19”. Searching the code reveals that this is in the rootconf function. After more tracing, it became apparent that the kernel was trying to set up the NIC but was failing to find a device with the MAC address iBFT indicated. (19 is ENODEV)

At this point, it dawned on me that the on-board NICs are mere nge devices. I popped in a PCI-X e1000g moved the cable over and rebooted. Things got a lot farther!

unable to connect

Currently, I’m looking at this output.

NOTICE: Configuring iSCSI boot session...
NOTICE: iscsi connection(5) unable to connect to target iqn.2010-08.org.illumos:02:oi-test
Loading smf(5) service descriptions: 171/171
Hostname: oi-test
Configuring devices.
Loading smf(5) service descriptions: 6/6
NOTICE: iscsi connection(12) unable to connect to target iqn.2010-08.org.illumos:02:oi-test

The odd thing is, while these appear SMF is busy loading manifests and tracing the iSCSI traffic to the target shows that the kernel is doing a bunch of reads and writes. I suspect that all the successful I/O was done over one connection and then something happens and we lose the link. This is where I am now.

Serial Console

Over the past couple of days, I’ve been testing my changes to the crashdump core in Illumos. (Here’s why.) I do most of my development on my laptop — either directly, or I use it to ssh into a dev box. For Illumos development, I use the ssh approach. Often, I end up using my ancient desktop (pre-HyperThreading era 2GHz Pentium 4) as a test machine. It gets pretty annoying to have a physical keyboard and monitor to deal with when the system crashes. The obvious solution is to use a serial console. Sadly, all the “Solaris serial console howtos” leave a lot to be desired. As a result, I am going to document the steps here. I’m connecting from Solaris to Solaris. If you use Linux on one of the boxes, you will have to do it a little differently.

Test Box

First, let’s change the console speed from the default 9600 to a more reasonable 115200. In /etc/ttydefs change the console line to:

console:115200 hupcl opost onlcr:115200::console

Second, we need to tell the kernel to use the serial port as a console. Here, I’m going to assume that you are using the first serial port (i.e., ttya). So, open up your Grub config (/rpool/boot/grub/menu.lst assuming your root pool is rpool) and find the currently active entry.

You’ll see something like this:

title openindiana-8
findroot (pool_rpool,0,a)
bootfs rpool/ROOT/openindiana-8
splashimage /boot/splashimage.xpm
foreground FF0000
background A8A8A8
kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS
module$ /platform/i86pc/$ISADIR/boot_archive

We need to add two options. One to tell the kernel to use the serial port as a console, and one to tell it the serial config (rate, parity, etc.).

You’ll want to change the kernel$ line to:

kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS,console=ttya,ttya-mode="115200,8,n,1,-" -k

Note that we appended the options with commas to the existing -B. If you do not already have a -B, just add it and the two new options. The -k will make the kernel drop into the debugger when bad things happen. You can omit it if you just want a serial console without the debugger getting loaded.

There’s one last thing left to do. Let’s tell grub to use the same serial port and not use a splash image. This can be done by adding these lines to the top of your menu.lst:

serial --unit=0 --speed=115200
terminal serial

and removing (commenting out) the splashimage line.

So, what happens if you make all these changes and then beadm creates a new BE? The right thing! beadm will copy over all the kernel options so your new BE will just work.

Dev Box

I use OpenIndiana on my dev box. I could have used minicom, but I find minicom to be a huge pain unless you have a modem you want to talk to. I’m told that screen can talk to serial ports as well. I decided to keep things super-simple and configured tip.

First, one edits /etc/remote. I just changed the definition for hardwire to point to the first serial port (/dev/term/a) and use the right speed (115200):

hardwire:\
	:dv=/dev/term/a:br#115200:el=^C^S^Q^U^D:ie=%$:oe=^D:

Then, I can just run a simple command to get the other system:

$ tip hardwire

Powered by blahgd