I decided a couple of days ago to try to see if OpenIndiana would still fail to boot from iSCSI like it did about two years ago. This post exists to remind me later what I did. If you find it helpful, great.
First, I got to set up the target. There is a bunch of documentation how to use COMSTAR to export a LU, so I won’t explain. I made a 100 GB LU.
I dug up an older system to act as my test box and disconnected its SATA disk. Booting from the OI USB image was uneventful. Before starting the installer, dropped into a shell and connected to the target (using iscsiadm). Then I installed OI onto the LU. Then, I dropped back into the shell to modify Grub’s menu.lst to use the serial port for both the Grub menu as well as make the kernel direct console output there.
Since the two on-board NICs can’t boot off iSCSI, I ended up using iPXE to boot off iSCSI. First, I made a script file:
#!ipxe
dhcp
sanboot iscsi:172.16.0.1:::0:iqn.2010-08.org.illumos:02:oi-test
Then it was time to grab the source and build it. I did run into a simple problem in a test file, so I patched it trivially.
$ git clone git://git.ipxe.org/ipxe.git
$ cd ipxe
$ cat /tmp/ipxe.patch
diff --git a/src/tests/vsprintf_test.c b/src/tests/vsprintf_test.c
index 11512ec..2231574 100644
--- a/src/tests/vsprintf_test.c
+++ b/src/tests/vsprintf_test.c
@@ -66,7 +66,7 @@ static void vsprintf_test_exec ( void ) {
/* Basic format specifiers */
snprintf_ok ( 16, "%", "%%" );
snprintf_ok ( 16, "ABC", "%c%c%c", 'A', 'B', 'C' );
- snprintf_ok ( 16, "abc", "%lc%lc%lc", L'a', L'b', L'c' );
+ //snprintf_ok ( 16, "abc", "%lc%lc%lc", L'a', L'b', L'c' );
snprintf_ok ( 16, "Hello world", "%s %s", "Hello", "world" );
snprintf_ok ( 16, "Goodbye world", "%ls %s", L"Goodbye", "world" );
snprintf_ok ( 16, "0x1234abcd", "%p", ( ( void * ) 0x1234abcd ) );
$ patch -p1 < /tmp/ipxe.patch
$ make bin/ipxe.usb EMBED=/tmp/ipxe.script
$ sudo dd if=bin/ipxe.usb of=/dev/rdsk/c8t0d0p0 bs=1M
Now, I had a USB flash drive with iPXE that’d get a DHCP lease and then proceed to boot from my iSCSI target.
Did the system boot? Partially. iPXE did everything right — DHCP, storing the iSCSI information in the iBFT, reading from the LU and handing control over to Grub. Grub did the right thing too. Sadly, once within kernel, things didn’t quite work out the way they should.
iBFT
Was the iBFT getting parsed properly? After reading the code for a while and using mdb to examine the state, I found a convenient tunable (read: global int that can be set using the debugger) that will cause the iSCSI boot parameters to be dumped to the console. It is called iscsi_print_bootprop. Setting it to non-zero will produce nice output:
Welcome to kmdb
kmdb: unable to determine terminal type: assuming `vt100'
Loaded modules: [ unix krtld genunix ]
[0]> iscsi_print_bootprop/W 1
iscsi_print_bootprop: 0 = 0x1
[0]> :c
OpenIndiana Build oi_151a7 64-bit (illumos 13815:61cf2631639d)
SunOS Release 5.11 - Copyright 1983-2010 Oracle and/or its affiliates.
All rights reserved. Use is subject to license terms.
Initiator Name : iqn.2010-04.org.ipxe:00020003-0004-0005-0006-000700080009
Local IP addr : 172.16.0.179
Local gateway : 172.16.0.1
Local DHCP : 0.0.0.0
Local MAC : 00:02:b3:a8:66:0c
Target Name : iqn.2010-08.org.illumos:02:oi-test
Target IP : 172.16.0.1
Target Port : 3260
Boot LUN : 0000-0000-0000-0000
nge vs. e1000g
So, the iBFT was getting parsed properly. The only “error” message to indicate that something was wrong was the “Cannot plumb network device 19”. Searching the code reveals that this is in the rootconf function. After more tracing, it became apparent that the kernel was trying to set up the NIC but was failing to find a device with the MAC address iBFT indicated. (19 is ENODEV)
At this point, it dawned on me that the on-board NICs are mere nge devices. I popped in a PCI-X e1000g moved the cable over and rebooted. Things got a lot farther!
unable to connect
Currently, I’m looking at this output.
NOTICE: Configuring iSCSI boot session...
NOTICE: iscsi connection(5) unable to connect to target iqn.2010-08.org.illumos:02:oi-test
Loading smf(5) service descriptions: 171/171
Hostname: oi-test
Configuring devices.
Loading smf(5) service descriptions: 6/6
NOTICE: iscsi connection(12) unable to connect to target iqn.2010-08.org.illumos:02:oi-test
The odd thing is, while these appear SMF is busy loading manifests and tracing the iSCSI traffic to the target shows that the kernel is doing a bunch of reads and writes. I suspect that all the successful I/O was done over one connection and then something happens and we lose the link. This is where I am now.