Josef “Jeff” Sipek

Working @ Dovecot

It’s been a hectic couple of weeks, and so this post is a bit delayed. Oh well.

A couple of months ago, I decided that it was time for me to move on work-wise. As a result, four weeks ago, I joined Dovecot Oy (a part of Open-Xchange).

As you may have guessed from the name of the company, I get to spend my time making the Dovecot email server code better, more featureful, and otherwise more excellent. It is certainly a significant but fun change—going from kernel hacking on a fairly unknown operating system to hacking on the world’s most popular IMAP server. Not a day goes by where I’m not surprised just how much functionality is in the Dovecot codebase, or when I get to consult an RFC related to some IMAP extension I didn’t even know existed.

So, with this said, you should expect to see some posts related to Dovecot, Dovecot code, and email in general.

Wannalancit Mill

As you may or may not know, Nexenta has a small office in Massachusetts, and so I end up in Wikipedia article: Lowell a couple of times a week. Lowell is a decent size city with a history of textile production. As a result, the city is peppered with old mills, most of which have been converted to office and apartment buildings and a couple serve as museums.

The Wannalancit mill is one of the mills that ended up turning into an office building. (It is connected to the adjacent Suffolk mill so I often forget that technically they are separate buildings. You’ve been warned.)

The thing that makes this mill more interesting is that the National Park Service maintains an operational water turbine in the basement. The turbine turns a large flywheel (I am guessing it is about 5 m in diameter).

The turbine itself is in the “basement” along with other goodies. The basement is not very well lit, but the D750 performed quite well even at ISO 5000–8000.

The turbine is geared to the flywheel.

Finally, here is the turbine (inside the red-brown metal object in the background) and the governor (green machine in the foreground) controlling the amount of water entering the turbine and therefore the amount of energy getting stored in the flywheel.

The park service shows up in the mornings to “turn-on” the turbine.

A close-up of the governor:

The table on the left was used for repairing of the 2 cm thick leather belts. I got the impression that the four section cabinet housed containers of oil used to lubricate various parts around the mill.

There are more than three times as many photos in the full gallery. Enjoy!

Bugs in Time

Recently, I blahgd about GCC optimizing code interestingly. There, I mentioned a couple of bugs I’ve stumbled across. I’m going to talk more about them in this post.

ddi_get_time

It all started when I got assigned a bug at work. “The installer hangs while checking available disks.” That’s the extent of the information I was given along with a test system. It didn’t take long to figure that devfsadm -c disk was waiting on a kernel thread that didn’t seem to be making any progress:

swtch+0x141
cv_timedwait_hires+0xec
cv_reltimedwait+0x51
ibdm`ibdm_ibnex_port_settle_wait+0x5f
ib`ibnex_bus_config+0x1e8
devi_config_common+0xa5
mt_config_thread+0x58
thread_start+8

The function of interest here is ibdm_ibnex_port_settle, but before I talk about it I need to mention that the ibdm kmod stashes a ddi_get_time timestamp of when the HCA attached. Now, ibdm_ibnex_port_settle calls ibdm_get_waittime to get a delay to feed to cv_reltimedwait. The delay is (more or less) calculated as: ddi_get_time() - hca_attach_time. This works fine as long as ddi_get_time continues incrementing at a constant rate (1 sec/sec).

You may already see where this is going. The problem is that ddi_get_time returns a Unix timestamp based on the current time-of-day clock. If the TOD setting changes for whatever reason (daylight saving time adjustments, NTP, etc.), the value returned by ddi_get_time may change non-monotonically. This makes it unsuitable for calculating timeouts and wait times. Converting ibdm_get_waittime to use a monotonic clock source (like gethrtime or ddi_get_lbolt) fixes this bug. (Illumos bug 4777)

Things get a bit worse. While figuring out what ddi_get_time does, I noticed that the man page actively encouraged developers to use it for timeouts. (Illumos bug 4776)

Of course, once I knew about this potential abuse, I had to check that there weren’t similar issues elsewhere in the kernel… and so I got to file bugs for iprb (4778), vhci (4779), COMSTAR iSCSI target (4780), sd (4781), usba (4782), emlxs (4786), ipf (4787), mac (4788), amr (4789), arcmsr (4790), aac (4791), and heci (4792).

I’m fixing all except: amr, arcmsr, aac, and heci.

NANOSEC

While developing the series of fixes mentioned in the previous section, I ran into the fact that NANOSEC was defined as 1000000000. This made it an int — a 32-bit signed integer (on both ILP32 and LP64).

If NANOSEC (defined this way) is used to convert seconds to nanoseconds (by multiplying), the naive approach will fail with quantities larger than 2 seconds. For example (hrtime_t is a 64-bit signed int):

hrtime_t convert(int secs)
{
        return (secs * NANOSEC);
}

Since both secs and NANOSEC are integers, the compiler will compute the product and then sign extend the result to 64-bits. If you look around the Illumos codebase, you’ll see plenty of places that cast or use ULL or LL suffix to make the compiler do the right thing. Why not just change the definition of NANOSEC to include a LL suffix releaving the users of this tedious (and error prone!) duty? Well, now you know what Illumos bug 4809 is about. :)

So, I changed the definition and rebuilt everything. Then, using wsdiff (think: recursive diff that understands how to compare ELF files) I found two places where the before and after binaries differed for non-trivial reasons. (I define a trivial reason as “the compiler decided to use registers differently, but the result is the same”.) Each non-trivial difference implies that there was an expression that changed — it used to be busted!

The first difference was in ZFS (Illumos bug 4810). There, spa_async_tasks_pending miscalculated a timeout making the condition always true.

The second difference was in in.mpathd. 4811). This daemon has a utility function to convert a struct timeval into a hrtime_t. You can read more about it in my previous post.

Before the NANOSEC change, I would have needed casts to fix this. With the change in definition, I don’t have to change a thing! And that’s how a one liner closed three bugs at the same time:

commit b59e2127f21675e88c58a4dd924bc55eeb83c7a6
Author: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Date:   Mon Apr 28 15:53:04 2014 -0400

    4809 NANOSEC should be 'long long' to avoid integer overflow bugs
    4810 spa_async_tasks_pending suffers from an integer overflow bug
    4811 in.mpathd: tv2ns suffers from an integer overflow bug
    Reviewed by: Marcel Telka <marcel.telka@nexenta.com>
    Reviewed by: Dan McDonald <danmcd@omniti.com>
    Approved by: Robert Mustacchi <rm@joyent.com>

Greetings from Nexenta

In case you missed it, back in mid-2011 I discovered Illumos and OpenIndiana. At that point, I already missed hacking on the (Linux) kernel. Based on my blahg posts [1,2], it shouldn’t surprise you that it didn’t take long before I wanted to hack on the Illumos kernel…and so I did.

If you ever contributed to an open source project in your free time while employed full-time, you understand that there’s only so much time you can devote to the open source project and therefore there is only so much you can do.

A couple of months ago, I decided to explore the possibility of working full-time on Illumos. There are only a handful of companies that visibly participate in the Illumos ecosystem, but their use of Illumos is pretty varied (from public clouds to virtualized databases to SAN/NAS appliances). As of this past Tuesday (Monday was a holiday), I’m at Nexenta. At least for now, I’m working remotely (from Ann Arbor) with the fine folks in the Wikipedia article: Lowell office. It feels great to work on open source again.

Powered by blahgd