Josef “Jeff” Sipek

OH-LCD

This post is part of a series named “Europe 2017” where I share photos from my adventures in Europe during the summer 2017.

When I attended the Kaivopuisto Air Show in early June last year, I learned about the existence of the Finnish Aviation Museum. It took me a month and a half, but eventually I found a free day to go check it out.

The museum itself is packed with all sorts of aircraft on static display. While they were interesting (and I certainly took plenty of photos of them), they aren’t what this post is about. This post is about Lokki—a retired Wikipedia article: DC-3 (registration OH-LCD) on display outside of the museum.

As luck would have it, the folks from the DC Association were there that day trying to see if they could start up Lokki’s engines—after 12 years of inactivity. After a lot of preparation, they managed to start them!

Without further ado, here are a few photos of Lokki (more photos can be found in the gallery).

Wikipedia article: Aero OY was the original name of Finnair:

One of the mechanics working on the left engine:

One of the people from the DC Association, seeing that I was obviously excited about the plane, asked me if I’d like to climb inside. I said yes, of course.

The inside was pretty bare-bones (which is to be expected of a static display that’s normally closed to public). I took a couple of photos inside, but most weren’t that interesting.

Throttle quadrant (note: most of the instrument panel was removed long ago):

It runs!

The livery is pretty simple—polished aluminum with dark blue lettering and a stripe:

I’m not really sure why they wanted to see if they could start the engines, but I’m happy that it worked out. Radial engines just have a unique roar to them.

Anyway, that’s it about Lokki. Hopefully I’ll get around to post processing the photos from the museum itself soon.

Modern Mercurial - Phases

This post is part of a series named “Modern Mercurial” where I share my realizations about how much Mercurial has advanced since 2005 without me noticing.

Last year, I had a realization that I haven’t been using Mercurial to its full potential. In this post, I’d like to share my thoughts about and usage of Mercurial Phases.

Phases are not a new feature. They made their first appearance back in 2012 as part of Mercurial 2.1, which makes them a little over 6 years old.

What are phases?

While there is a description of phases on the Mercurial wiki, I’ll take a stab at a short intro.

Each commit belongs to one of three phases (public, draft, or secret) which implies a set of allowed operations on the commit. Furthermore, the phase dictates which other phase or phases the commit can transition to.

You can think of the phases as totally ordered (secretdraftpublic) and a commit’s phase can only move in that direction. That is, a secret commit can become either a draft or a public commit, a draft commit can become a public commit, and a public commit is “stuck” being public. (Of course if you really want to, Mercurial allows you to force a commit to any phase via hg phase -f.)

The allowed operations on a commit of a particular phase are pretty self-explanatory:

Public commits are deemed immutable and sharable—meaning that if you try to perform an operation on a commit that would modify it (e.g., hg commit –amend), Mercurial will error out. All read-only operations as well as pushing and pulling are allowed.

Secret commits are mutable and not sharable—meaning that all modifications are allowed, but the commits are not pullable or pushable. In other words, a hg pull will not see secret commits in the remote repository, and a hg push will not push secret commits to the remote repository.

Draft commits are mutable and sharable—a phase between public and secret. Like secret commits, changes to commits are allowed, and like public commits, pushing and pulling is allowed.

Or in tabular form:

Phase Commits Sharing
public immutable allowed
draft mutable allowed
secret mutable prevented

By default, all new commits are automatically marked as draft, and when a draft commit is pushed it becomes public on both ends.

Note that these descriptions ignore the amazing changeset evolution features making their way into current Mercurial since they can blur the “not yet shared” nature of draft commits. (Perhaps I should have titled this post Modern Mercurial (2012 edition) — Phases.)

A note about hg log

Unfortunately, the default hg log output does not display phases at all. I think this is rather unfortunate (but understandable from a backwards compatibility point of view).

Last year, I dedicated a whole post to how I template hg log information including my reasoning for why I display phases the way I do.

How do I use phases?

Now that we have the basic introduction to phases out of the way, let me describe how I mapped them to my workflow.

First of all, I make all new commits start in the secret phase (instead of the default draft) with a quick addition to .hgrc:

[phases]
new-commit = secret

This immediately prevents an accidental hg push from pushing commits that I’m still working on. (Recall that secret commits cannot be pushed.) In at least one repository, this allowed me to regularly have more than 6 heads with various work-in-progress feature ideas without the fear of accidentally messing up a public repository. Before I started using phases, I used separate clones to get similar (but not as thorough) protections.

Now, I work on a commit for a while (keeping it in the secret phase), and when I feel like I’m done, I transition it to the draft phase (via hg phase -d). At that point, I’m basically telling Mercurial (and myself when I later look at hg log) that I’m happy enough with the commit to push it.

Depending on what I’m working on, I may or may not push it immediately after (which would transition the commit to the public phase). Usually, I hold off pushing the commit if it is part of a series, but I haven’t done the last-chance sanity checks of the other commits.

Note: I like to run hg push without specifying a revision to push. I find this natural (and less to type). If I always specified a revision, then phases wouldn’t help me as much.

“Ugly” repos

I have a couple of repositories that I use for managing assorted data like my car’s gasoline utilization. In these repositories, the commits are simple data point additions to a CSV file and the commit messages are repetitive one-liners. (These one-liners create a rather “ugly” commit history.)

In essence, the workflow these repositories see can be summarized as:

$ echo "2018-04-05,12345,17.231," >> data.csv
$ hg commit -m "more gas"
$ hg push

In these repositories, I’ve found that defaulting to the secret phase was rather annoying because every commit was immediately followed by a phase change to allow the push to work. So, for these repos I changed new-commit back to draft.

Edit: I reworded the sentence about Mercurial giving you a way to force a commit to any phase based on feedback on lobste.rs.

Kaivopuisto Air Show 2017

This post is part of a series named “Europe 2017” where I share photos from my adventures in Europe during the summer 2017.

In early June 2017, we attended an air show in Wikipedia article: Kaivopuisto. Unfortunately, we found out about it last minute, and so we missed the beginning which included a Finnair Airbus A350 flyby. Pity.

The show included a number of trainers and combat aircraft performing various maneuvers. Here are the highlights (for more photos visit the gallery).

Wikipedia article: Red Arrows:

A seagull joining in:

Wikipedia article: Finnish Coast Guard’s Wikipedia article: Turva nearby with Wikipedia article: Suomenlinna visible behind it:

Wikipedia article: Eurofighter Typhoon:

Wikipedia article: Saab 35 Draken:

Wikipedia article: Saab Gripen:

During one of the passes, I took a burst of images and then assembled them into a Southwest 737 “Airportrait”-style image.

Finnish Air Force Wikipedia article: F-18 Hornet:

A Finnish aerobatics team Wikipedia article: Midnight Hawks flying Wikipedia article: BAE Systems Hawk:

Even though this post has more photos than I typically share, there are many more in the gallery. So, if you are into airplanes, I suggest you peruse it.

Juhannus 2017

This post is part of a series named “Europe 2017” where I share photos from my adventures in Europe during the summer 2017.

You may have noticed that I was a bit quiet during the last summer. I have a really good reason for it: I spent five months in Helsinki for work. On weekends, Holly and I got to explore, which led me to accumulate approximately 12000 photos. Sadly, I am quite behind on post processing them all, but I will get through them eventually.

This post is about how I spent Wikipedia article: Juhannus last year.

Juhannus is the name of the Finnish summer solstice holiday. It is a time to relax, spend time with friends and family, and enjoy oneself. Every year, a nearby island, Wikipedia article: Seurasaari, has an afternoon and evening with an assortment of traditional events and bonfires.

There is of course a gallery of my photos.

Every year, one couple is selected to have their wedding on Seurasaari during Juhannus. Here is 2017’s lucky couple:

Before about half a dozen bonfires are set ablaze, a number of “can fires” is lit:

The largest bonfire gets lit by the newlyweds—from a boat:

I’m not sure how exactly the big bonfire pile was constructed, but it didn’t take long for it to grow:

So, that was Juhannus on Seurasaari in 2017. It was a nice and relaxing afternoon and evening, and if I happen to be in Helsinki around Juhannus in the future, I’ll likely spend the day on Seurasaari.

I’m going to end this post with a bit of Finnish (from finland.fi) because languages can be fun:

– Kokoo koko kokko kokoon!
– Koko kokkoko?
– Koko kokko.

Meaning:

– Assemble the Midsummer bonfire!
– The whole Midsummer bonfire?
– Yes, the whole Midsummer bonfire.

(I’m told that kokoo is a dialect form of kokoa.)

Rust Pointers for C Programmers

I’ve been eyeing Rust for about a year now. Here and there, I tried to use it to make a silly little program, or to implement some simple function in it to see for myself how ergonomic it really was, and what sort of machine code rustc spit out. But last weekend I found a need for a tool to clean up some preprocessor mess, and so instead of hacking together some combination of shell and Python, I decided to write it in Rust.

From my earlier attempts, I knew that there are a lot of different “pointers” but I found all the descriptions of them lacking or confusing. Specifically, Rust calls itself a systems programming language, yet I found no clear description of how the different pointers map to C—the systems programming language. Eventually, I stumbled across The Periodic Table of Rust Types, which made things a bit clearer, but I still didn’t feel like I truly understood.

During my weekend expedition to Rust land, I think I’ve grokked things enough to write this explanation of how Rust does things. As always, feedback is welcomed.

I’ll describe what happens in terms of C. To keep things simple, I will:

  • assume that you are well-versed in C
  • assume that you can read Rust (any intro will teach you enough)
  • not bother with const for the C snippets
  • not talk about mutability

In the following text, I assume that we have some struct T. The actual contents don’t matter. In other words:

struct T {
	/* some members */
};

With that out of the way, let’s dive in!

*const T and *mut T

These are raw pointers. In general, you shouldn’t use them since only unsafe code can dereference them, and the whole point of Rust is to write as much safe code as possible.

Raw pointers are just like what you have in C. If you make a pointer, you end up using sizeof(struct T *) bytes for the pointer. In other words:

struct T *ptr;

&T and &mut T

These are borrowed references. They use the same amount of space as raw pointers and behave same exact way in the generated machine code. Consider this trivial example:

#[no_mangle]
pub fn raw(p: *mut usize) {
    unsafe {
        *p = 5;
    }

}

#[no_mangle]
pub fn safe(p: &mut usize) {
    *p = 5;
}

A rustc invocation later, we have:

raw()
    raw:     55                 pushq  %rbp
    raw+0x1: 48 89 e5           movq   %rsp,%rbp
    raw+0x4: 48 c7 07 05 00 00  movq   $0x5,(%rdi)
             00 
    raw+0xb: 5d                 popq   %rbp
    raw+0xc: c3                 ret    

safe()
    safe:     55                 pushq  %rbp
    safe+0x1: 48 89 e5           movq   %rsp,%rbp
    safe+0x4: 48 c7 07 05 00 00  movq   $0x5,(%rdi)
              00 
    safe+0xb: 5d                 popq   %rbp
    safe+0xc: c3                 ret    

Note that the two functions are bit-for-bit identical.

The only differences between borrowed references and raw pointers are:

  1. references will never point at bogus addresses (i.e., they are never NULL or uninitialized),
  2. the compiler doesn’t let you do arbitrary pointer arithmetic on references,
  3. the borrow checker will make you question your life choices for a while.

(#3 gets better over time.)

Box<T>

These are owned “pointers”. If you are a C++ programmer, you are already familiar with them. Never having truly worked with C++, I had to think about this a bit until it clicked, but it is really easy.

No matter what all the documentation and tutorials out there say, Box<T> is not a pointer but rather a structure containing a pointer to heap allocated memory just big enough to hold T. The heap allocation and freeing is handled automatically. (Allocation is done in the Box::new function, while freeing is done via the Drop trait, but that’s not relevant as far as the memory layout is concerned.) In other words, Box<T> is something like:

struct box_of_T {
	struct T *heap_ptr;
};

Then, when you make a new box you end up putting only what amounts to sizeof(struct T *) on the stack and it magically starts pointing to somewhere on the heap. In other words, the Rust code like this:

let x = Box::new(T { ... });

is roughly equivalent to:

struct box_of_t x;

x.heap_ptr = malloc(sizeof(struct T));
if (!x.heap_ptr)
	oom();

*x.heap_ptr = ...;

&[T] and &mut [T]

These are borrowed slices. This is where things get interesting. Even though it looks like they are just references (which, as stated earlier, translates into a simple C-style pointer), they are much more. These types of references use fat pointers—that is, a combination of a pointer and a length.

struct fat_pointer_to_T {
	struct T *ptr;
	size_t nelem;
};

This is incredibly powerful, since it allows bounds checking at runtime and getting a subset of a slice is essentially free!

&[T; n] and &mut [T; n]

These are borrowed references to arrays. They are different from borrowed slices. Since the length of an array is a compile-time constant (the compiler will yell at you if n is not a constant), all the bounds checking can be performed statically. And therefore there is no need to pass around the length in a fat pointer. So, they are passed around as plain ol’ pointers.

struct T *ptr;

T, [T; n], and [T]

While these aren’t pointers, I thought I’d include them here for completeness’s sake.

T

Just like in C, a struct uses as much space as its type requires (i.e., sum of the sizes of its members plus padding).

[T; n]

Just like in C, an array of structs uses n times the size of the struct.

[T]

The simple answer here is that you cannot make a [T]. That actually makes perfect sense when you consider what that type means. It is saying that we have some variable sized slice of memory that we want to access as elements of type T. Since this is variable sized, the compiler cannot possibly reserve space for it at compile time and so we get a compiler error.

The more complicated answer involves the Sized trait, which I’ve skillfully managed to avoid thus far and so you are on your own.

Summary

That was a lot of text, so I decided to compact it and make the following table. In the table, I assume that our T struct is 100 bytes in size. In other words:

/* Rust */
struct T {
    stuff: [u8; 100],
}

/* C */
struct T {
	uint8_t stuff[100];
};

Now, the table in its full glory:

Rust C Size on
ILP32/LP64
(bytes)
Value
let x: T;
struct T x;
100/100
Raw pointer
let x: *const T;
let x: *mut T;
struct T *x;
4/8
Reference
let x: &T;
let x: &mut T;
struct T *x;
4/8
Box
let x: Box<T>;
struct box_of_T {
	struct T *heap_ptr;
};

struct box_of_T x;
4/8
Array of 2
let x: [T; 2];
struct T x[2];
200/200
Reference to
an array of 2
let x: &[T; 2];
struct T *x;
4/8
A slice
let x: [T];
struct T x[];
unknown at
compile time
A reference
to a slice
let x: &[T];
struct fat_ptr_to_T {
	struct T *ptr;
	size_t nelem;
};

struct fat_ptr_to_T x;
8/16

A word of caution: I assume that the sizes of the various pointers are actually implementation details and shouldn’t be relied on to be that way. (Well, with the exception of raw pointers - without those being fixed FFI would be unnecessarily complicated.)

I didn’t cover str, &str, String, and Vec<T> since I don’t consider them fundamental types, but rather convenience types built on top of slices, structs, references, and boxes.

Anyway, I hope you found this useful. If you have any feedback (good or bad), let me know.

CBOR vs. JSON vs. libnvpair

My blahg uses nvlists for logging extra information about its operation. Historically, it used Sun libnvpair. That is, it used its data structures as well as the XDR encoding to serialize the data to disk.

A few months ago, I decided to replace libnvpair with my own nvlist implementation—one that was more flexible and better integrated with my code. (It is still a bit of a work-in-progress, but it is looking good.) The code conversion went smoothly, and since then all the new information was logged in JSON.

Last night, I decided to convert a bunch of the previously accumulated libnvpair data files into the new JSON-based format. After whipping up a quick conversion program, I ran it on the data. The result surprised me—the JSON version was about 55% of the size of the libnvpair encoded input!

This piqued my interest. I re-ran the conversion but with CBOR (RFC 7049) as the output format. The result was even better with the output being 45% of libnvpair’s encoding.

This made me realize just how inefficient libnvpair is when serialized. At least part of it is because XDR (the way libnvpair serializes data) uses a lot of padding, while both JSON and CBOR use a more compact encoding for many data types (e.g., an unsigned number in CBOR uses 1 byte for the type and 0, 1, 2, 4, or 8 additional bytes based on its magnitude, while libnvpair always encodes a uint64_t as 8 bytes plus 4 bytes for the type).

Since CBOR is 79% of JSON’s size (and significantly less underspecified compared to the minefield that is JSON), I am hoping to convert everything that makes sense to CBOR. (CBOR being a binary format makes it harder for people to hand-edit it. If hand-editing is desirable, then it makes sense to stick with JSON or other text-based formats.)

The Data & Playing with Compression

The blahg-generated dataset that I converted consisted of 230866 files, each containing an nvlist. The following byte counts are a simple concatenation of the files. (A more complicated format like tar would add a significant enough overhead to make the encoding efficiency comparison flawed.)

Format Size % of nvpair
nvpair 471 MB 100%
JSON 257 MB 54.6%
CBOR 203 MB 45.1%

I also took each of the concatenated files and compressed it with gzip, bzip2, and xz. In each case, I used the most aggressive compression by using -9. The percentages in parentheses are comparing the compressed size to the same format’s uncompressed size. The results:

Format Uncomp. gzip bzip2 xz
nvpair 471 MB 37.4 MB (7.9%) 21.0 MB (4.5%) 15.8 MB (3.3%)
JSON 257 MB 28.7 MB (11.1%) 17.9 MB (7.0%) 14.5 MB (5.6%)
CBOR 203 MB 26.8 MB (13.2%) 16.9 MB (8.3%) 13.7 MB (6.7%)

(The compression ratios are likely artificially better than normal since each of the 230k files has the same nvlist keys.)

Since tables like this are hard to digest, I turned the same data into a graph:

CBOR does very well uncompressed. Even after compressing it with a general purpose compression algorithm, it outperforms JSON with the same algorithm by about 5%.

I look forward to using CBOR everywhere I can.

2017-11-14

Doug Engelbart Institute — Online exhibits, historic videos, texts, archive photos, and stories about Doug Engelbart, the inventor of the mouse, hypertext, and GUIs…all in the 1960s

Flight recorders data inspection by Airbus

Parsing JSON is a Minefield

Completely Painless Programmer’s Guide to XYZ, RGB, ICC, xyY, and TRCs — Brain-hurting amount of information about color profiles, etc.

darktable — A Lightroom-like open source software

World plugs — Info about every electric plug form factor in the world

Google Traffic, iOS edition

Several years ago, I wrote about how Google gets traffic information and how to turn off this location reporting on Android phones. Since then I’ve switched to iPhones. While I normally use the built-in Maps app, I keep Google Maps installed as a fallback—just in case.

I upgraded my phone recently and so I spent some time going through all the apps and making sure they worked and didn’t have more access than necessary. This is when I discovered that the Google Maps app for iOS defaults to collecting location information and sharing it with Google. Given my previous post, this isn’t really surprising.

Turning it off

Anyway, as with the Android post, here are the steps to limit Google’s collection of location information.

First of all, in Settings → Privacy → “Location Services”, I changed Google Maps’s permission to “While Using the App”. If I’m not using the app, then it doesn’t need to know where I am.

Second, in the app itself, go to: Settings → “About, terms & privacy” → “Location data collection”. That’s right, this setting is buried in what appears to be a page with the boring legal notices.

And then turn off the the toggle:

That should do it…at least assuming that Google honors the settings in its own app.

Creative xor Use

Last month at work I got to try to optimize a function that takes a number and rounds it up to the next power of 2. The previous implementation used a simple loop. I didn’t dive into obscure bit twiddling, but rather used a helper function that is already in the codebase. Yes, I let the compiler do the heavy lifting of turning easy to understand code into good machine code. The x86 binary that gcc 6.3 produced has an interesting idiom, and that’s why I’m writing this entry.

The new code:

static inline unsigned int bits_required32(uint32_t num)
{
        return num == 0 ? 0 : 32 - __builtin_clz(num);
}

/* Returns x, such that x is the smallest power of 2 >= num. */
uint32_t nearest_power(uint32_t num)
{
	if (num == 0)
		return 1;

        return 1U << bits_required32(num - 1);
}

This is a slightly simplified version of the code, but it demonstrates the optimization quite well.

The nearest_power function disassembles as:

nearest_power()
    nearest_power:      8b 54 24 04        movl   0x4(%esp),%edx
    nearest_power+0x4:  b8 01 00 00 00     movl   $0x1,%eax
    nearest_power+0x9:  85 d2              testl  %edx,%edx
    nearest_power+0xb:  74 14              je     +0x14	<nearest_power+0x21>
    nearest_power+0xd:  83 ea 01           subl   $0x1,%edx
    nearest_power+0x10: 74 0f              je     +0xf	<nearest_power+0x21>
    nearest_power+0x12: 0f bd d2           bsrl   %edx,%edx
    nearest_power+0x15: b9 20 00 00 00     movl   $0x20,%ecx
    nearest_power+0x1a: 83 f2 1f           xorl   $0x1f,%edx
    nearest_power+0x1d: 29 d1              subl   %edx,%ecx
    nearest_power+0x1f: d3 e0              shll   %cl,%eax
    nearest_power+0x21: c3                 ret    

The first 6 instructions contain the prologue and deal with num being zero or one—both cases produce the result 1. The remaining 6 instructions make up the epilogue and are where the calculation happens. I’m going to ignore the first half of the function, since the second half is where the interesting things happen.

First, we get the number of leading zeros in num - 1 and stash the value 32 in a register:

    nearest_power+0x12: 0f bd d2           bsrl   %edx,%edx
    nearest_power+0x15: b9 20 00 00 00     movl   $0x20,%ecx

The number of leading zeros (%edx) is in the range 0–31.

Here is the really interesting bit:

    nearest_power+0x1a: 83 f2 1f           xorl   $0x1f,%edx

This xors the number of leading zeros (i.e., 0–31) with 31. To decipher what this does, I find it easier to consider the top 27 bits and the bottom 5 bits separately.

operand binary
0x1f 00000000 00000000 00000000 000 11111
edx 00000000 00000000 00000000 000 xxxxx

The xor of the top bits produces 0 since both the constant 31 and the register containing any of the numbers 0–31 have zeros there.

The xor of the bottom bits negates them since the constant has ones there.

When combined, the xor has the same effect as this C expression:

out = (~in) & 0x1f;

This seems very weird and useless, but it is far from it. It turns out that for inputs 0–31 the above expression is the same as:

out = 31 - in;

I think it is really cool that gcc produced this xor instead of a less optimal multi-instruction version.

The remainder of the disassembly just subtracts and shifts to produce the return value.

Why xor?

I think the reason gcc (and clang for that matter) produce this sort of xor instruction instead of a subtraction is very simple: on x86 the sub instruction’s left hand side and the destination must be the same register. That is, on x86 the sub instruction works as:

x -= y;

Since the destination must be a register, it isn’t possible to express out = 31 - in using just one sub.

Anyway, that’s it for today. I hope you enjoyed this as much as I did.

Modern Mercurial - hg log

This post is part of a series named “Modern Mercurial” where I share my realizations about how much Mercurial has advanced since 2005 without me noticing.

As I pointed out recently, I ended up customizing my .hgrc to better suit my needs. In this post, I’m going to talk about my changes to tailor the hg log output to my liking.

There are three issues I have with the default hg log format:

  1. By default, only the first line of the commit message is shown. To see it fully, you need to use verbose mode.
  2. In verbose mode, the touched files are listed as well without a way to hide them.
  3. In verbose mode, the listed files are not listed one per line, but rather as a single line.

If, like me, you prefer the Linux-kernel style commit messages, you likely want to see the whole message when you look at the log (problem #1). Here is, for example, a screenshot of a commit using the default style (normal and verbose mode):

hg log

You can work around not seeing the whole commit message by always using the verbose mode, but that means that you’ll also be assaulted by the list of changed files (problem #2) without a way to hide it. To make the second problem even worse, the file names are listed on a single line, so all but the most trivial of changes create an impossible to read blob of file names (problem #3). For example, even with only a handful of files touched by a commit:

hg log -v

At least, those are my problems with the default format. I’m sure some people like the default just the way it is. Thankfully, Mercurial is sporting a powerful templating engine, so I can override the style whichever way I want.

Demo

Ok, before I dive into the rather simple config file changes, let’s take a look at a screenshot of the result on a test repository:

hg log -G

As you can see, the format of each log entry is similar to that of git log (note that the whole multi-line commit message is displayed, see revision 1), but with extra information. What exactly does it all mean? I think the best way to explain all the various bits of information is to show you an annotated version of the same screenshot:

hg log -G

I’m now going to describe the reasons why the various bits of information are presented the just way they are. If you aren’t interested in this description, skip ahead to the next section where I present the actual configuration changes I made.

Each commit hash (in yellow) is followed by a number of “items” that tell you more about the commit.

First is the phase. The phase name is abbreviated to a single letter (or no letter for the public phase) and color coded. It is the first item because every commit has a phase, the phase is an important bit of information, and the “encoded” phase info is very compact.

The reasoning behind the phase letters and colors is as follows:

public phase (no letter)
Public commits are not interesting since everyone has them, so don’t draw attention to them by omitting a letter.
secret phase (‘S’)
The only interesting thing about secret commits is that they will not be pushed. That means that they cannot be accidentally pushed either. Since this behavior is “boring”, use dark blue to indicate that they are different from public commits, but do not draw too much attention to them.
draft phase (‘D’)
These are the “dangerous” commits. Pushing them will change the remote repository’s state, so draw significantly more attention to these by using red.

I use letters instead of just using a different color for the commit hash for a very simple reason—if colors aren’t rendering properly, I still want to be able to tell the phases apart.

Second comes the named branch. When looking at several commits (e.g., hg log), most of the time any two adjacent commits will be on the same named branch. On top of that, each commit belongs to exactly one named branch. Therefore, even though the named branch name is not a fixed field, it behaves as one. In my experience, it is a good idea to display fixed fields before any variable length fields to make it easier for the eyes to spot any differences. (Yes, technically the way I display the phase information is not fixed width and therefore the named branch will not always start in the same column, but in practice adjacent commits tend to have the same phase as well, so the named branch will always be in a semi-fixed position.) Note that in Mercurial the “default” branch is usually rendered as the empty string, and I follow that convention with my template.

Third comes the list of tags. Each commit can have many tags. This is the first item on the line that can become unreasonably long. At least in the repositories that I deal with, there aren’t very many tags per commit, so I haven’t seen any bad effects.

Fourth and final comes the list of bookmarks. Much like tags, there can be many, but in practice there are very few. Since I deal with tags more often than bookmarks, I put the bookmark information after the tags. The active bookmark is rendered as bold.

The choice of colors for named branches (cyan), tags (green), and bookmarks (magenta) was guided by a simple principle: they should go well with the yellow color of the changeset line, and not draw too much attention but still be visually distinct. Sadly, on a terminal without color support, they will all render the same way. I think this is still workable, since repositories have conventions for branches/tags/bookmarks naming and therefore the user can still guess what type of name it is. (Worst case, the user can consult other hg commands to figure out what exactly is being displayed.)

The checked out commit and the active bookmark being rendered as bold without any additional indication that they are different is also unfortunate. I haven’t found a pleasant way to render this information that would convey the same information on dumb terminals. (Note that there is a class of terminals that support bold fonts but not different colors. Even those will render this info correctly.)

Config

So, how did I achieve this glorious output? It’s not too complicated, but it took me a while to tune things just to my liking.

First, I make a custom style file with two templates—changeset and changeset_verbose:

changeset_common = '{label(ifcontains(rev, revset('parents()'),
      "log.activechangeset",
      "log.changeset"),
      "commit {rev}:{node}")}\
      {label("log.phase_{phase}",
	ifeq(phase, "public",
	  "",
	  " {ifeq(phase,"draft","D","S")}"))}\
      {label("log.branch", ifeq(branch, "default", "", " {branch}"))}\
      {label("log.tag", if(tags, " {tags}"))}\
      {bookmarks % "{ifeq(bookmark, currentbookmark,
	label('log.activebookmark', " {bookmark}"),
	label('log.bookmark', " {bookmark}"))}"}
    {ifeq(parents,"","","{ifeq(p2rev,-1,"Parent: ","Merge: ")}{parents}\n")}\
    Author: {author}
    Date:   {date(date,"%c %z")}\n
    {indent(desc,"    ")}\n'
changeset_files = '{ifeq(files, "", "", "\n {join(files,\"\n \")}\n")}'

changeset_verbose = '{changeset_common}{changeset_files}\n'
changeset = '{changeset_common}\n'

Normally, changeset is used by hg log and other revision set printing commands, while changeset_verbose is used when you provide them with the -v switch. In my template, the only difference between the two is that the verbose version prints the list of files touched by the commit.

Second, in my .hgrc, I define the colors I want to use for the various bits of info:

[color]
log.activebookmark = magenta bold
log.activechangeset = yellow bold
log.bookmark = magenta
log.branch = cyan
log.changeset = yellow
log.phase_draft = red bold
log.phase_secret = blue bold
log.tag = green

Finally, in my .hgrc, I set the default style to point to my style file:

[ui]
style = $HOME/environ/hg/style

That’s all there is to it! Feel free to take the above snippets and tailor them to your liking.

hg log -v vs. hg log –stat sidenote

My first version of the template did not support the verbose mode. I didn’t think this was a big deal, and I simply used hg log –stat instead. This provides the list of files touched by the commit and a visual indication how much they changed. For example, here’s a close up of two commits in the same test repo:

hg log -G –stat

Then one day, I tried to do that on a larger repo with a cold cache. It was very slow. It made sense why—not only did Mercurial need to list all the commits, it also needed to produce the diff of each commit only to do some basic counting for the diffstat.

My solution to the problem was to make verbose mode list all the files touched by the commit by using {files}. This is rather cheap since it requires consulting the manifest instead of calculating the diff for each commit. For example, here are the same two commits as above but in verbose mode:

hg log -G -v

It certainly has less detail, but it is good enough when you want to search the log output for a specific file name.

Powered by blahgd