Josef “Jeff” Sipek

Retiring Guilt

It took me about 3 years to write this post. Partly because I had other things I wanted to work on and partly because I hoped that it wouldn’t be needed. Well, I finally decided that I really need to write this.

In short, I’m officially stopping work on guilt.

Practically speaking, I haven’t touched it (as a developer) in over two years and as a user in about as long. So really, nothing will change.

What is guilt?

I started writing Guilt in fall 2006 because I was working on unionfs and needed to maintain patches on top of the Linux kernel git repository—much like what the mq extension did with Mercurial repositories.

It all started with:

commit 664e5a7d7f8d2c2726f03a239de11fa00127cf84
Author: Josef Sipek <jsipek@thor.fsl.cs.sunysb.edu>
Date:   Mon Nov 6 13:08:30 2006 -0500

    Initial commit

That’s right, 14 years to the day.

Technically, the first few versions were called “gq” (which stood for “git quilt”) until someone pointed out that “GQ” was a well established GTK-based LDAP client.

Artifacts

If anyone wishes to resurrect this project, then by all means go for it. If not, the old content will remain online for as long as I have a web server. :)

Specifically, you can find everything up to and including the last release (v0.37-rc1) at the following locations:

Users

I know that Guilt has served a number of people quite well over the years. It’s been quite stable and mostly feature complete since at least 2008, so I haven’t really been hearing from people short of the occasional patch or an occasional “oh yeah, I use that”.

To those users: I hope the last release works well enough for you until someone starts to maintain Guilt again or you find a different tool that suits your needs.

Email vs. Tool du Jour

TL;DR: Just because email is decades old doesn’t mean that it cannot serve a vital role in modern project management, research, development, and support.

Ultimately, working on a project requires communication—and lots of it. Communication with peers, with managers, with other departments within the company, and even with customers. It is tempting to grab the Tool du Jour and add it to your ever-growing arsenal of tools believing it will make communication easier. Often, it does not.

For example, let’s consider these tools: Jira, Confluence, Slack, Zoom, GitHub/GitLab, phone, and email.

Does your company use these tools or their equivalents? Isn’t it a bit overkill to have 7 different channels of communication? Sure, often one tool is better at a particular mode of communication than the others but there is a significant overlap.

Do you want a video chat? Do you use Slack or Zoom?

Voice chat? Slack, Zoom, or phone?

Do you want to ask a question related to a bug? Do you use Jira, Slack, or just set up a call? Voice or video? Or would email be best?

Do you keep track of your project via high-level Jira issues? Or do you use a set of Confluence pages where you include various semi-autogenerated plots?

Wikipedia article: Decision fatigue is real. Do you want your (rather expensive) employees to waste their cognitive capacity deciding which tool to use? Or do you want them to make the product better?

It is painful how many times over the past ten years I’ve witnessed conversations that went much like this:

A: Can you answer the question I left in Jira?
B: <B reads Jira question> Oh, that is answered on the ABC123 Confluence page.
A: Ah. Can you make a note of that in Jira? Thanks.

This example involves three communication channels—Jira, Confluence, and some chat system.

This sort of communication fragmentation is really bad. Not only does it waste a lot of time with exchanges like this example, it also makes searching for information essentially impossible. Who in their right mind would search half a dozen tools (with various degrees of search capability) for something? It is simply easier to just ask your coworkers. After all, their time is less valuable to you than your own time and sanity.

So, what can be done to improve things?

Well, if at all possible do not use tools that have duplicate functionality. If you have to, hopefully you can disable the duplicate functionality. If there is no way to disable it, then you must make it painfully clear where such communication should go. Hopefully this can be done via automated hooks that somehow notify the user. For example, automatically closing issues opened in the wrong bug tracker (e.g., opened in GitHub instead of Jira), or automatically responding to wiki commenters directing them to the proper medium for wiki discussion. Finally, if all else fails, have someone (ideally manager or team lead so the notification has some weight to it) manually make sure that anyone that uses the functionality is told not to.

This reduction in the number of tools should also help with responsiveness. It is no secret that Wikipedia article: the average human can hold only about 7 things in working memory at the same time. How many of those do you want to dedicate to tooling? If I have to remember to check 7 different tools periodically, one of two things happens: either I manage to check them all but accomplish nothing else, or I get things done but only remember about 2 or 3 tools.

That should help with quite a bit of the fragmentation. Now “all” that’s left to do is decide which communication channel is used for what.

I have concluded that there are four major levels of communication:

  1. important, synchronous
  2. important, asynchronous
  3. unimportant, synchronous
  4. unimportant, asynchronous

I’m using the terms “synchronous” to mean that you want the back-and-forth latency to be low, and “important” to mean that that you must have an answer. Note that “unimportant” does not mean off-topic, but rather lower priority.

Why make the synchronous/asynchronous distinction? For multiple reasons. First of all, being interrupted in the middle of something is costly. It takes a significant amount of time to get back “into the zone” but only a fraction of a minute to get out of it. Would you rather pay your employees to try to work or to actually work? And second, asynchrony makes communication across time zones easier. Not easy, but easier.

So, let me go through the four major levels of communication one by one and share my opinion about what works and why.

important, synchronous
If you want to have a (relatively) rapid back-and-forth, you pretty much have to use an in-person meeting or a voice/video call. A one-to-one (i.e., non-group) chat can also possibly work, but there will be temptation to multi-task. This desire to multi-task implies that the chat isn’t actually that important.
important, asynchronous
When you don’t require having the answers immediately or when it simply isn’t possible to get everyone in the same “room” at the same time for a meeting (in person, voice, or video), email is probably the best communication method. Each person can read it and possibly reply at a the most convenient time for them.
unimportant, synchronous
This is the form of communication that includes various chit-chat, sanity checking polls (e.g., “would anyone object if I tried xyz?”), and so on. It lets you quickly get bits of information, but in a way it is unreliable. Not everyone is reading the chat when you say something and when it scrolls off the screen it is as if you never said it. In other words, do not expect anyone to read the group chat messages from when they were away. If you want someone specific or even everyone to see a particular message, it is not an unimportant message. One-to-one chat is a little different since it is more “reliable”, but usually anything substantial that is important will end up with a call instead.
unimportant, asynchronous
Finally, all the things you’d like others to see at some point in the future should be sent as an email. The recipients will read it when they get to it, and since it isn’t important it probably doesn’t even require a reply.

These four levels are, of course, not set in stone. It is possible (and I’d even encourage it) to upgrade or downgrade your communication as needed. For example, it is perfectly reasonable to ask in chat if there are objections or obvious issues with a particular approach, function, or workload. Then, if the responses don’t make it obviously a terrible idea but a more definitive discussion is desired, a similar (but more detailed) version can be sent via email. In essence, upgrading it from “unimportant synchronous” to “important asynchronous”. (Caution: don’t overdo these upgrades/downgrades.)

As you can see, I think that email is a good choice for any asynchronous communication. That’s for good reasons. Everyone has an email address, everyone knows how to use it (at least a little), and the free-form format allows you to use the most appropriate content type to get your point across—be it ASCII art, images, or even Excel spreadsheets. In other words:

Email is ubiquitous.

Email works remarkably well.

Email is extremely flexible.

As a real world example, consider that pretty much every company-wide announcement (important or not) has been made either in a huge meeting or via email. Often the meeting-time announcements are followed up by an email anyway! It’s not a chat message. It’s not a Confluence page. It’s not a Jira issue. It’s an email.

Before I conclude, I’d like to address two slightly more specific cases.

First, what about issue tracking? How does that tie into my email-centric world? Well, you can keep your issue tracker, but in my opinion, the comments feature should not be used. If a ticket needs to be discussed, send an email, set up a conference call, whatever works for your—just don’t use the comments on the issue. If you look at any issue in your issue tracker, the comments will fall into one of two categories. Either there are none or there are many, and it is painfully clear that people don’t read them and ask the same questions over and over. Instead of burying progress reports or updates to the understanding of the issue in a comment that nobody will ever read, that information should be used to reword the issue description. The same largely applies to other tools’ comments sections as well.

Second is a concern that people will not read all those emails. I think this is only a problem if there are too many tools and email isn’t viewed as an important one. If (unrealistically) all communication happens through email, then right after communicating with someone, the person is already in the right tool to handle the next communication. If code reviews, support requests, and everything else were to go to the same place, there is nearly zero context switching cost. Even if the person goes to use a different tool (e.g., a text editor or an IDE), when that work is done, they’ll return to their email client. In other words, if you make the email communication channel important, your emails will get read. If you don’t make it important, then you (individually) are better off using a channel the recipient considers important. In an environment with too many tools, each recipient may have a different preference.

Just to make it painfully clear, I am not advocating killing off everything except email. Instead, I’m advocating killing off tools that duplicate functionality, and shifting all asynchronous communication to a single medium. In my experience, the most efficient (and least disruptive) asynchronous communication medium is email. And therefore it should not only be one of the tools that survives the culling, but it also should be the one that is embraced afterwards.

That’s it for today. In the next post, I’ll talk about what I consider the ideal code review workflow.

2020-05-06

OpenMCT — While I’m not a fan of web-based UIs, this is a rather neat “dashboard” framework by NASA.

Wideband spectrum received in JO32KF — Over 5 years of HF spectrum waterfall in Enschede, NL.

10 Most(ly dead) Influential Programming Languages

Wikipedia article: PACELC theorem — An extension of the Wikipedia article: CAP theorem.

Learn Rust the Dangerous Way — Finally a Rust tutorial that speaks to people comfortable in C.

Interferometry and Synthesis in Radio Astronomy — An open access book.

Aviation Formulary — Great circle math applied to various aviation problems for those too lazy to derive the formulas themselves.

Papírová platidla Československa 1918-1993, České republiky a Slovenské republiky 1993-2016 — Complete list of all bank notes used in Czechoslovakia, Czech Republic, and Slovak Republic.

NOAA GOES Image ViewerWikipedia article: GOES weather satellite imagery.

Time-based One-time Passwords

Recently I ended up playing with Wikipedia article: Time-based One-time Passwords as a second factor when authenticating with various services. When I saw an RFC referenced in the references section, I looked at it to get a better idea of how complicated the algorithm really is. It turns out that TOTP is very simple. So simple that I couldn’t help but put together a quick and dirty implementation in Python.

TOTP itself is documented in RFC 6238. It is a rather short RFC, but that’s because all it really says is “use HOTP and feed it these values”.

HOTP is documented in RFC 4226. This RFC is a bit longer since it has to describe how the counter value gets hashed and the resulting digest gets mangled. Reading it, one will learn that the HMAC-SHA1 is the basic building block of HOTP.

HMAC is documented in RFC 2104.

With these three documents (and a working implementation of SHA1), it is possible to implement your own TOTP.

The Key

If you follow those four RFCs, you’ll have a working TOTP. However, that’s not enough to make use of the code. The whole algorithm is predicated on having a pre-shared secret—a key. Typically, the service you are enabling TOTP for will issue you a key and you have to feed it into the algorithm to start generating passwords. Since showing the user the key in binary is not feasible, some sort of encoding is needed.

I couldn’t find any RFC that documents best practices for sharing the key with the user. After a while, I found a Google Authenticator wiki page describing the format of the key URIs used by Google Authenticator.

It turns out that this is a very common format. It uses a base32 encoding with the padding stripped. (Base32 is documented in RFC 4648.)

The “tricky” part is recreating this padding to make the decoder happy. Since base32 works on 40-bit groups (it converts between 5 raw bytes and 8 base-32 chars), we must pad to the nearest 40-bit group.

The Code

I tried to avoid implementing HMAC-SHA1, but I couldn’t find it in any of the modules Python ships with. Since it is a simple enough algorithm, I implemented it as well. Sadly, it nearly doubles the size of the code.

Warning: This is proof-of-concept quality code. Do not use it in production.

import struct
import hashlib
import base64
import time

# The pre-shared secret (base32 encoded):
key = "VGMT4NSHA2AWVOR6"

def HMAC(k, data, B=64):
    def H(m):
        return hashlib.sha1(m).digest()

    # keys too long get hashed
    if len(k) > B:
        k = H(k)

    # keys too short get padded
    if len(k) < B:
        k = k + ("\x00" * (B - len(k)))

    ikey = "".join([chr(ord(x) ^ 0x36) for x in k])
    okey = "".join([chr(ord(x) ^ 0x5c) for x in k])

    return H(okey + H(ikey + data))

def hotp(K, C, DIGITS=6):
    def Truncate(inp):
        off = ord(inp[19]) & 0xf

        x = [ord(x) for x in inp[off:(off+4)]]

        return ((x[0] & 0x7f) << 24) | (x[1] << 16) | (x[2] << 8) | x[3]

    return Truncate(HMAC(K, struct.pack(">Q", C))) % (10 ** DIGITS)

def totp(K, T=time.time(), X=30, T0=0, DIGITS=6):
    return hotp(K, long(T - T0) / long(X), DIGITS=DIGITS)

# pad to the nearest 40-bit group
if len(key) % 8 != 0:
    key=key + ("=" * (8 - (len(key) % 8)))

key=base64.b32decode(key.upper())

print time.ctime(), time.time()
print "TOTP: %06d" % totp(key)

This code is far from optimal, but I think it nicely demonstrates the simplicity of TOTP.

References

Rust Pointers for C Programmers

I’ve been eyeing Rust for about a year now. Here and there, I tried to use it to make a silly little program, or to implement some simple function in it to see for myself how ergonomic it really was, and what sort of machine code rustc spit out. But last weekend I found a need for a tool to clean up some preprocessor mess, and so instead of hacking together some combination of shell and Python, I decided to write it in Rust.

From my earlier attempts, I knew that there are a lot of different “pointers” but I found all the descriptions of them lacking or confusing. Specifically, Rust calls itself a systems programming language, yet I found no clear description of how the different pointers map to C—the systems programming language. Eventually, I stumbled across The Periodic Table of Rust Types, which made things a bit clearer, but I still didn’t feel like I truly understood.

During my weekend expedition to Rust land, I think I’ve grokked things enough to write this explanation of how Rust does things. As always, feedback is welcomed.

I’ll describe what happens in terms of C. To keep things simple, I will:

  • assume that you are well-versed in C
  • assume that you can read Rust (any intro will teach you enough)
  • not bother with const for the C snippets
  • not talk about mutability

In the following text, I assume that we have some struct T. The actual contents don’t matter. In other words:

struct T {
	/* some members */
};

With that out of the way, let’s dive in!

*const T and *mut T

These are raw pointers. In general, you shouldn’t use them since only unsafe code can dereference them, and the whole point of Rust is to write as much safe code as possible.

Raw pointers are just like what you have in C. If you make a pointer, you end up using sizeof(struct T *) bytes for the pointer. In other words:

struct T *ptr;

&T and &mut T

These are borrowed references. They use the same amount of space as raw pointers and behave same exact way in the generated machine code. Consider this trivial example:

#[no_mangle]
pub fn raw(p: *mut usize) {
    unsafe {
        *p = 5;
    }

}

#[no_mangle]
pub fn safe(p: &mut usize) {
    *p = 5;
}

A rustc invocation later, we have:

raw()
    raw:     55                 pushq  %rbp
    raw+0x1: 48 89 e5           movq   %rsp,%rbp
    raw+0x4: 48 c7 07 05 00 00  movq   $0x5,(%rdi)
             00 
    raw+0xb: 5d                 popq   %rbp
    raw+0xc: c3                 ret    

safe()
    safe:     55                 pushq  %rbp
    safe+0x1: 48 89 e5           movq   %rsp,%rbp
    safe+0x4: 48 c7 07 05 00 00  movq   $0x5,(%rdi)
              00 
    safe+0xb: 5d                 popq   %rbp
    safe+0xc: c3                 ret    

Note that the two functions are bit-for-bit identical.

The only differences between borrowed references and raw pointers are:

  1. references will never point at bogus addresses (i.e., they are never NULL or uninitialized),
  2. the compiler doesn’t let you do arbitrary pointer arithmetic on references,
  3. the borrow checker will make you question your life choices for a while.

(#3 gets better over time.)

Box<T>

These are owned “pointers”. If you are a C++ programmer, you are already familiar with them. Never having truly worked with C++, I had to think about this a bit until it clicked, but it is really easy.

No matter what all the documentation and tutorials out there say, Box<T> is not a pointer but rather a structure containing a pointer to heap allocated memory just big enough to hold T. The heap allocation and freeing is handled automatically. (Allocation is done in the Box::new function, while freeing is done via the Drop trait, but that’s not relevant as far as the memory layout is concerned.) In other words, Box<T> is something like:

struct box_of_T {
	struct T *heap_ptr;
};

Then, when you make a new box you end up putting only what amounts to sizeof(struct T *) on the stack and it magically starts pointing to somewhere on the heap. In other words, the Rust code like this:

let x = Box::new(T { ... });

is roughly equivalent to:

struct box_of_t x;

x.heap_ptr = malloc(sizeof(struct T));
if (!x.heap_ptr)
	oom();

*x.heap_ptr = ...;

&[T] and &mut [T]

These are borrowed slices. This is where things get interesting. Even though it looks like they are just references (which, as stated earlier, translates into a simple C-style pointer), they are much more. These types of references use fat pointers—that is, a combination of a pointer and a length.

struct fat_pointer_to_T {
	struct T *ptr;
	size_t nelem;
};

This is incredibly powerful, since it allows bounds checking at runtime and getting a subset of a slice is essentially free!

&[T; n] and &mut [T; n]

These are borrowed references to arrays. They are different from borrowed slices. Since the length of an array is a compile-time constant (the compiler will yell at you if n is not a constant), all the bounds checking can be performed statically. And therefore there is no need to pass around the length in a fat pointer. So, they are passed around as plain ol’ pointers.

struct T *ptr;

T, [T; n], and [T]

While these aren’t pointers, I thought I’d include them here for completeness’s sake.

T

Just like in C, a struct uses as much space as its type requires (i.e., sum of the sizes of its members plus padding).

[T; n]

Just like in C, an array of structs uses n times the size of the struct.

[T]

The simple answer here is that you cannot make a [T]. That actually makes perfect sense when you consider what that type means. It is saying that we have some variable sized slice of memory that we want to access as elements of type T. Since this is variable sized, the compiler cannot possibly reserve space for it at compile time and so we get a compiler error.

The more complicated answer involves the Sized trait, which I’ve skillfully managed to avoid thus far and so you are on your own.

Summary

That was a lot of text, so I decided to compact it and make the following table. In the table, I assume that our T struct is 100 bytes in size. In other words:

/* Rust */
struct T {
    stuff: [u8; 100],
}

/* C */
struct T {
	uint8_t stuff[100];
};

Now, the table in its full glory:

Rust C Size on
ILP32/LP64
(bytes)
Value
let x: T;
struct T x;
100/100
Raw pointer
let x: *const T;
let x: *mut T;
struct T *x;
4/8
Reference
let x: &T;
let x: &mut T;
struct T *x;
4/8
Box
let x: Box<T>;
struct box_of_T {
	struct T *heap_ptr;
};

struct box_of_T x;
4/8
Array of 2
let x: [T; 2];
struct T x[2];
200/200
Reference to
an array of 2
let x: &[T; 2];
struct T *x;
4/8
A slice
let x: [T];
struct T x[];
unknown at
compile time
A reference
to a slice
let x: &[T];
struct fat_ptr_to_T {
	struct T *ptr;
	size_t nelem;
};

struct fat_ptr_to_T x;
8/16

A word of caution: I assume that the sizes of the various pointers are actually implementation details and shouldn’t be relied on to be that way. (Well, with the exception of raw pointers - without those being fixed FFI would be unnecessarily complicated.)

I didn’t cover str, &str, String, and Vec<T> since I don’t consider them fundamental types, but rather convenience types built on top of slices, structs, references, and boxes.

Anyway, I hope you found this useful. If you have any feedback (good or bad), let me know.

CBOR vs. JSON vs. libnvpair

My blahg uses nvlists for logging extra information about its operation. Historically, it used Sun libnvpair. That is, it used its data structures as well as the XDR encoding to serialize the data to disk.

A few months ago, I decided to replace libnvpair with my own nvlist implementation—one that was more flexible and better integrated with my code. (It is still a bit of a work-in-progress, but it is looking good.) The code conversion went smoothly, and since then all the new information was logged in JSON.

Last night, I decided to convert a bunch of the previously accumulated libnvpair data files into the new JSON-based format. After whipping up a quick conversion program, I ran it on the data. The result surprised me—the JSON version was about 55% of the size of the libnvpair encoded input!

This piqued my interest. I re-ran the conversion but with CBOR (RFC 7049) as the output format. The result was even better with the output being 45% of libnvpair’s encoding.

This made me realize just how inefficient libnvpair is when serialized. At least part of it is because XDR (the way libnvpair serializes data) uses a lot of padding, while both JSON and CBOR use a more compact encoding for many data types (e.g., an unsigned number in CBOR uses 1 byte for the type and 0, 1, 2, 4, or 8 additional bytes based on its magnitude, while libnvpair always encodes a uint64_t as 8 bytes plus 4 bytes for the type).

Since CBOR is 79% of JSON’s size (and significantly less underspecified compared to the minefield that is JSON), I am hoping to convert everything that makes sense to CBOR. (CBOR being a binary format makes it harder for people to hand-edit it. If hand-editing is desirable, then it makes sense to stick with JSON or other text-based formats.)

The Data & Playing with Compression

The blahg-generated dataset that I converted consisted of 230866 files, each containing an nvlist. The following byte counts are a simple concatenation of the files. (A more complicated format like tar would add a significant enough overhead to make the encoding efficiency comparison flawed.)

Format Size % of nvpair
nvpair 471 MB 100%
JSON 257 MB 54.6%
CBOR 203 MB 45.1%

I also took each of the concatenated files and compressed it with gzip, bzip2, and xz. In each case, I used the most aggressive compression by using -9. The percentages in parentheses are comparing the compressed size to the same format’s uncompressed size. The results:

Format Uncomp. gzip bzip2 xz
nvpair 471 MB 37.4 MB (7.9%) 21.0 MB (4.5%) 15.8 MB (3.3%)
JSON 257 MB 28.7 MB (11.1%) 17.9 MB (7.0%) 14.5 MB (5.6%)
CBOR 203 MB 26.8 MB (13.2%) 16.9 MB (8.3%) 13.7 MB (6.7%)

(The compression ratios are likely artificially better than normal since each of the 230k files has the same nvlist keys.)

Since tables like this are hard to digest, I turned the same data into a graph:

CBOR does very well uncompressed. Even after compressing it with a general purpose compression algorithm, it outperforms JSON with the same algorithm by about 5%.

I look forward to using CBOR everywhere I can.

Creative xor Use

Last month at work I got to try to optimize a function that takes a number and rounds it up to the next power of 2. The previous implementation used a simple loop. I didn’t dive into obscure bit twiddling, but rather used a helper function that is already in the codebase. Yes, I let the compiler do the heavy lifting of turning easy to understand code into good machine code. The x86 binary that gcc 6.3 produced has an interesting idiom, and that’s why I’m writing this entry.

The new code:

static inline unsigned int bits_required32(uint32_t num)
{
        return num == 0 ? 0 : 32 - __builtin_clz(num);
}

/* Returns x, such that x is the smallest power of 2 >= num. */
uint32_t nearest_power(uint32_t num)
{
	if (num == 0)
		return 1;

        return 1U << bits_required32(num - 1);
}

This is a slightly simplified version of the code, but it demonstrates the optimization quite well.

The nearest_power function disassembles as:

nearest_power()
    nearest_power:      8b 54 24 04        movl   0x4(%esp),%edx
    nearest_power+0x4:  b8 01 00 00 00     movl   $0x1,%eax
    nearest_power+0x9:  85 d2              testl  %edx,%edx
    nearest_power+0xb:  74 14              je     +0x14	<nearest_power+0x21>
    nearest_power+0xd:  83 ea 01           subl   $0x1,%edx
    nearest_power+0x10: 74 0f              je     +0xf	<nearest_power+0x21>
    nearest_power+0x12: 0f bd d2           bsrl   %edx,%edx
    nearest_power+0x15: b9 20 00 00 00     movl   $0x20,%ecx
    nearest_power+0x1a: 83 f2 1f           xorl   $0x1f,%edx
    nearest_power+0x1d: 29 d1              subl   %edx,%ecx
    nearest_power+0x1f: d3 e0              shll   %cl,%eax
    nearest_power+0x21: c3                 ret    

The first 6 instructions contain the prologue and deal with num being zero or one—both cases produce the result 1. The remaining 6 instructions make up the epilogue and are where the calculation happens. I’m going to ignore the first half of the function, since the second half is where the interesting things happen.

First, we get the number of leading zeros in num - 1 and stash the value 32 in a register:

    nearest_power+0x12: 0f bd d2           bsrl   %edx,%edx
    nearest_power+0x15: b9 20 00 00 00     movl   $0x20,%ecx

The number of leading zeros (%edx) is in the range 0–31.

Here is the really interesting bit:

    nearest_power+0x1a: 83 f2 1f           xorl   $0x1f,%edx

This xors the number of leading zeros (i.e., 0–31) with 31. To decipher what this does, I find it easier to consider the top 27 bits and the bottom 5 bits separately.

operand binary
0x1f 00000000 00000000 00000000 000 11111
edx 00000000 00000000 00000000 000 xxxxx

The xor of the top bits produces 0 since both the constant 31 and the register containing any of the numbers 0–31 have zeros there.

The xor of the bottom bits negates them since the constant has ones there.

When combined, the xor has the same effect as this C expression:

out = (~in) & 0x1f;

This seems very weird and useless, but it is far from it. It turns out that for inputs 0–31 the above expression is the same as:

out = 31 - in;

I think it is really cool that gcc produced this xor instead of a less optimal multi-instruction version.

The remainder of the disassembly just subtracts and shifts to produce the return value.

Why xor?

I think the reason gcc (and clang for that matter) produce this sort of xor instruction instead of a subtraction is very simple: on x86 the sub instruction’s left hand side and the destination must be the same register. That is, on x86 the sub instruction works as:

x -= y;

Since the destination must be a register, it isn’t possible to express out = 31 - in using just one sub.

Anyway, that’s it for today. I hope you enjoyed this as much as I did.

Exclusive Or Character

A couple of years ago I blogged about the CCS instruction in the Apollo Guidance Computer. Today I want to tell you about the XC instruction from the System/360 ISA.

Many ISAs have some sort of xor instruction. The 360 is no different. It offers several different xor instructions which differ in the type of operands that they operate on. In all cases, the operation they perform could be summarized as (using C syntax):

A ^= B;

That is one of the operands is used as both a source and a destination.

There are the boring X (reg ^= memory), XR (reg ^= reg), and XI (reg ^= immediate). Then there is XC which is what inspired this post. XC, or Exclusive Or Character, takes two memory locations and a length and performs what appears as byte by byte xor of the two buffers. (The hardware is smart enough to operate on bigger chunks of memory but the effect is as if it was done byte at a time.) In assembly XC looks like:

XC d1(l,b1),d2(b2)

The d are 12-bit unsigned displacements while the b specify the registers with the base address. For each of the operands the actual address is dX plus the value of the bX register. The l is a length field which encodes a length between 1 and 256.

To use more C pseudocode, XC does:

void XC(unsigned char *op1, size_t len, unsigned char *op2)
{
	while (len--) {
		*op1 ^= *op2;
		op1++;
		op2++;
	}
}

(This pseudo code ignores the condition code calculation and exception generation which are not relevant to the discussion.)

This by itself is neat but not every exciting…until you remember that xor can be used to zero out a register. You can use XC to zero out up to 256 bytes of memory. It turns out this idiom is used pretty often in handwritten assembly, and compilers such as gcc even produce such instructions without any special effort on the programmer’s behalf.

For example, in HVF I have this line:

memset(&psw, 0, sizeof(struct psw));

Which GCC helpfully turns into (struct psw is 16 bytes in size):

xc      160(16,%r15),160(%r15)

When I first saw that line in the disassembly of HVF years ago, it blew my mind. It is elegant, fast thanks to the microarchitecture optimizations, and once you are used to the idiom it is clear about what it does. I hope your mind was as blown as mine. Till next time!

bool bitfield:1

This is the first of hopefully many posts related to interesting pieces of code I’ve stumbled across in the dovecot repository.

Back in 1999, C99 added the bool type. This is old news. The thing I’ve never seen before is what amounts to:

struct foo {
	bool	a:1;
	bool	b:1;
};

Sure, I’ve seen bitfields before—just never with booleans. Since this is C, the obvious thing happens here. The compiler packs the two bool bits into a single byte. In other words, sizeof(struct foo) is 1 (instead of 2 had we not used bitfields).

The compiler emits pretty compact code as well. For example, suppose we have this simple function:

void set(struct foo *x)
{
	x->b = true;
}

We compile it and disassemble:

$ gcc -c -O2 -Wall -m64 test.c
$ dis -F set test.o
disassembly for test.o

set()
    set:     80 0f 02           orb    $0x2,(%rdi)
    set+0x3: c3                 ret

Had we used non-bitfield booleans, the resulting code would be:

set()
    set:     c6 47 01 01        movb   $0x1,0x1(%rdi)
    set+0x4: c3                 ret

There’s not much of a difference in these simple examples, but in more complicated structures with many boolean flags the structure size difference may be significant.

Of course, the usual caveats about bitfields apply (e.g., the machine’s endian matters).

dis(1): support for System/370, System/390, and z/Architecture ELF bins

A few months ago, I came to the conclusion that it would be both fun and educational to add a new disassembler backend to libdisasm—the disassembler library in Illumos. Being a mainframe fan, I decided that implementing a System/390 and z/Architecture disassembler would be fun (I’ve done it before in HVF).

At first, I was targetting only the 390 and z/Architecture, but given that the System/370 is a trivial (almost) subset of the 390 (and there is a spec for 370 ELF files!), I ended up including the 370 support as well.

It took a while to get the code written (z/Architecture has so many instructions!) and reviewed, but it finally happened… the commit just landed in the repository.

If you get the latest Illumos bits, you’ll be able to disassemble 370, 390, and z/Architecture binaries with style. For example:

$ dis -F strcmp hvf             
disassembly for hvf

strcmp()
    strcmp:      a7 19 00 00        lghi    %r1,0
    strcmp+0x4:  a7 f4 00 08        j       0x111aec
    strcmp+0x8:  a7 1b 00 01        aghi    %r1,1
    strcmp+0xc:  b9 02 00 55        ltgr    %r5,%r5
    strcmp+0x10: a7 84 00 17        je      0x111b16
    strcmp+0x14: e3 51 20 00 00 90  llgc    %r5,0(%r1,%r2)
    strcmp+0x1a: e3 41 30 00 00 90  llgc    %r4,0(%r1,%r3)
    strcmp+0x20: 18 05              lr      %r0,%r5
    strcmp+0x22: 1b 04              sr      %r0,%r4
    strcmp+0x24: 18 40              lr      %r4,%r0
    strcmp+0x26: a7 41 00 ff        tmll    %r4,255
    strcmp+0x2a: a7 84 ff ef        je      0x111ae0
    strcmp+0x2e: 18 20              lr      %r2,%r0
    strcmp+0x30: 89 20 00 18        sll     %r2,%r0,24(%r0)
    strcmp+0x34: 8a 20 00 18        sra     %r2,%r0,24(%r0)
    strcmp+0x38: b9 14 00 22        lgfr    %r2,%r2
    strcmp+0x3c: 07 fe              br      %r14
    strcmp+0x3e: a7 28 00 00        lhi     %r2,0
    strcmp+0x42: b9 14 00 22        lgfr    %r2,%r2
    strcmp+0x46: 07 fe              br      %r14

I am hoping that this will help document all the places needed to change when adding support for a new ISA to libdisasm.

Happy disassembling!

Powered by blahgd