x2APIC, IOMMU, Illumos
About a week ago, I hinted at a boot hang I was debugging. I’ve made some progress with it, and along the way I found some interesting things about which I’ll blog over the next few days. Today, I’m going to talk about the APIC, xAPIC, and x2APIC and how they’re handled in Illumos.
APIC, xAPIC, x2APIC
I strongly suggest you become at least a little familiar with APIC architecture before reading on. The Wikipedia articles above are a good start.
First things first, we need some definitions. APIC can refer to either the architecture or to very old (pre-Pentium 4) implementation. Since I’m working with a Sandy Bridge, I’m going to use APIC to refer to the architecture and completely ignore that these chips existed. Everything they do is a subset of xAPIC. xAPIC is an extension to APIC. xAPIC chips started showed up in NetBurst architecture Intel CPUs (i.e., Pentium 4). xAPIC included some goodies such as upping the limit on the number of CPUs to 256 (from 16). x2APIC is an extension to xAPIC. x2APIC chips started appearing around the same time Sandy Bridge systems started showing up. It is a major update to how interrupts are handled, but as with many things in the PC industry the x2APIC is fully backwards compatible with xAPICs. x2APIC includes some goodies such as upping the limit on the number of CPUs to .
Regardless of which exact flavor you happen to use, you will find two components: the local APIC and I/O APIC. Each processor gets their own local APIC and I/O buses get I/O APICs. I/O APICs can service more than one device, and in fact many systems have only one I/O APIC.
The xAPIC uses MMIO to program the local and I/O APICs.
x2APIC has two mode of operation. First, there is the xAPIC compatibility mode which makes the x2APIC behave just like an xAPIC. This mode doesn’t give you all the new bells and whistles. Second, there is the new x2APIC mode. In this mode, the APIC is programmed using MSRs.
One interesting fact about x2APIC is that it requires an iommu. My Sandy Bridge laptop has an Intel iommu as part of the VT-d feature.
Illumos /etc/mach
x2APIC in Illumos has two APIC drivers. First, there is pcplusmp which knows how to handle APIC and xAPIC. Second, there is apix which targets x2APIC, but knows how to operate it in both modes. On boot, the kernel consults /etc/mach to get a list of machine specific modules to try to load. Currently, the default contents (trimmed for display here) are:
# # CAUTION! The order of modules specified here is very important. If the # order is not correct it can result in unexpected system behavior. The # loading of modules is in the reverse order specified here (i.e. the last # entry is loaded first and the first entry loaded last). # pcplusmp apix xpv_psm
Since I’m not running Xen, xpv_psm will fail to load, and apix gets its chance to load.
pcplusmp + apix Code Sharing
The code in these two modules can be summarized with a word: mess. Following what happens when would be enough of an adventure. The code for the two modules lives in four directories: usr/src/uts/i86pc/io, usr/src/uts/i86pc/io/psm, usr/src/uts/i86pc/io/pcplusmp, and usr/src/uts/i86pc/io/apix. But the sharing isn’t as straight forward as one would hope.
Directory | pcplusmp | apix |
i86pc/io | mp_platform_common.c, mp_platform_misc.c, hpet_acpi.c | mp_platform_common.c, hpet_acpi.c |
i86pc/io/psm | psm_common.c | psm_common.c |
i86pc/io/pcplusmp | * | apic_regops.c, apic_common.c, apic_timer.c |
i86pc/io/apix | — | * |
This is of course not clear at all when you look at the code. (Reality is a bit messier because of the i86xpv platform which uses some of the i86pc source.)
apix_probe
When the apix module gets loaded, its probe function (apix_probe) is called. This is the place where the module decides if the hardware is worthy. Specifically, if it finds that the CPU reports x2APIC support via cpuid, it goes on to call the common APIC probe code (apic_probe_common). Unless that fails, the system will use the apix module — even if there is no iommu and therefore the x2APIC needs to operate in xAPIC mode.
What mode are you using? Easy, just check the apic_mode global in the kernel:
# echo apic_mode::whatis | mdb -k fffffffffbd0ee4c is apic_mode, in apix's data segment # echo apic_mode::print | mdb -k 0x2
2 (LOCAL_APIC) indicates xAPIC mode, while 3 (LOCAL_X2APIC) indicates x2APIC mode.
Because this part is as clear as mud, I made a table that tells you what module and mode to expect given your hardware, what CPUID says, and the presence and state of the iommu.
APIC hw | CPUID | IOMMU | IOMMU state | Module | apic_mode |
xAPIC | off | — | — | pcplusmp | LOCAL_APIC |
x2APIC | off | — | — | pcplusmp | LOCAL_APIC |
x2APIC | on | absent | — | apix | LOCAL_APIC |
x2APIC | on | present | off | apix | LOCAL_APIC |
x2APIC | on | present | on | apix | LOCAL_X2APIC |
Defaults
I’ve never seen apic_mode equal to LOCAL_X2APIC in the wild. This was very puzzling. Yesterday, I discovered why. As I mentioned earlier, in order for the x2APIC to operate in x2APIC mode an iommu is required. Long story short, the default config that Illumos ships disables iommus on boot. Specifically:
$ cat /platform/i86pc/kernel/drv/rootnex.conf | grep -v '^\(#.*\|\)$' immu-enable="false";
In order to get LOCAL_X2APIC mode, you need to set:
immu-enable="true"; immu-intrmap-enable="true";
Once you put those into the config file, update you boot archive and reboot. You should be set… except the iommu support in Illumos is… shall we say… poor.
(I should point out that it is possible for the BIOS to enable x2APIC mode before handing control off to the OS. This is pretty rare unless you have a really big x86 system.)
1394
It would seem that the hci1394 driver doesn’t quite know how to deal with an iommu “messing” with its I/Os and its interrupt service routine shuts down the driver. (On a debug build it throws is ASSERT(0) for good measure.) I just disabled 1394 in the BIOS since I don’t have any Firewire devices handy and therefore no use for the port at the moment.
immu-enable Details
In case you want to know how iommu initialization affects the apix initialization…
During boot, immu_init gets called to initialize iommus. If the config option (immu-enable) is not true, the function just returns instead of calling immu_subsystems_setup which calls immu_intrmap_setup which sets psm_vt_ops to non-NULL value.
Later on, when apix is loaded and is initializing itself in apix_picinit, it calls apic_intrmap_init. This function does nothing if psm_vt_ops are NULL.
The Hang
I might as well tell you a bit about my progress on tracking down the hang. It happens only if I’m using the apix module and I allow deep C states in the idle thread (technically, it could also be an mwait related issue since I cannot disable just mwait without disabling deep C states). It does not matter if the apic_mode is LOCAL_APIC or LOCAL_X2APIC.
Assorted Documentation