Josef “Jeff” Sipek

Modern Mercurial - hg log

As I pointed out recently, I ended up customizing my .hgrc to better suit my needs. In this post, I’m going to talk about my changes to tailor the hg log output to my liking.

There are three issues I have with the default hg log format:

  1. By default, only the first line of the commit message is shown. To see it fully, you need to use verbose mode.
  2. In verbose mode, the touched files are listed as well without a way to hide them.
  3. In verbose mode, the listed files are not listed one per line, but rather as a single line.

If, like me, you prefer the Linux-kernel style commit messages, you likely want to see the whole message when you look at the log (problem #1). Here is, for example, a screenshot of a commit using the default style (normal and verbose mode):

hg log

You can work around not seeing the whole commit message by always using the verbose mode, but that means that you’ll also be assaulted by the list of changed files (problem #2) without a way to hide it. To make the second problem even worse, the file names are listed on a single line, so all but the most trivial of changes create an impossible to read blob of file names (problem #3). For example, even with only a handful of files touched by a commit:

hg log -v

At least, those are my problems with the default format. I’m sure some people like the default just the way it is. Thankfully, Mercurial is sporting a powerful templating engine, so I can override the style whichever way I want.

Demo

Ok, before I dive into the rather simple config file changes, let’s take a look at a screenshot of the result on a test repository:

hg log -G

As you can see, the format of each log entry is similar to that of git log (note that the whole multi-line commit message is displayed, see revision 1), but with extra information. What exactly does it all mean? I think the best way to explain all the various bits of information is to show you an annotated version of the same screenshot:

hg log -G

I’m now going to describe the reasons why the various bits of information are presented the just way they are. If you aren’t interested in this description, skip ahead to the next section where I present the actual configuration changes I made.

Each commit hash (in yellow) is followed by a number of “items” that tell you more about the commit.

First is the phase. The phase name is abbreviated to a single letter (or no letter for the public phase) and color coded. It is the first item because every commit has a phase, the phase is an important bit of information, and the “encoded” phase info is very compact.

The reasoning behind the phase letters and colors is as follows:

public phase (no letter)
Public commits are not interesting since everyone has them, so don’t draw attention to them by omitting a letter.
secret phase (‘S’)
The only interesting thing about secret commits is that they will not be pushed. That means that they cannot be accidentally pushed either. Since this behavior is “boring”, use dark blue to indicate that they are different from public commits, but do not draw too much attention to them.
draft phase (‘D’)
These are the “dangerous” commits. Pushing them will change the remote repository’s state, so draw significantly more attention to these by using red.

I use letters instead of just using a different color for the commit hash for a very simple reason—if colors aren’t rendering properly, I still want to be able to tell the phases apart.

Second comes the named branch. When looking at several commits (e.g., hg log), most of the time any two adjacent commits will be on the same named branch. On top of that, each commit belongs to exactly one named branch. Therefore, even though the named branch name is not a fixed field, it behaves as one. In my experience, it is a good idea to display fixed fields before any variable length fields to make it easier for the eyes to spot any differences. (Yes, technically the way I display the phase information is not fixed width and therefore the named branch will not always start in the same column, but in practice adjacent commits tend to have the same phase as well, so the named branch will always be in a semi-fixed position.) Note that in Mercurial the “default” branch is usually rendered as the empty string, and I follow that convention with my template.

Third comes the list of tags. Each commit can have many tags. This is the first item on the line that can become unreasonably long. At least in the repositories that I deal with, there aren’t very many tags per commit, so I haven’t seen any bad effects.

Fourth and final comes the list of bookmarks. Much like tags, there can be many, but in practice there are very few. Since I deal with tags more often than bookmarks, I put the bookmark information after the tags. The active bookmark is rendered as bold.

The choice of colors for named branches (cyan), tags (green), and bookmarks (magenta) was guided by a simple principle: they should go well with the yellow color of the changeset line, and not draw too much attention but still be visually distinct. Sadly, on a terminal without color support, they will all render the same way. I think this is still workable, since repositories have conventions for branches/tags/bookmarks naming and therefore the user can still guess what type of name it is. (Worst case, the user can consult other hg commands to figure out what exactly is being displayed.)

The checked out commit and the active bookmark being rendered as bold without any additional indication that they are different is also unfortunate. I haven’t found a pleasant way to render this information that would convey the same information on dumb terminals. (Note that there is a class of terminals that support bold fonts but not different colors. Even those will render this info correctly.)

Config

So, how did I achieve this glorious output? It’s not too complicated, but it took me a while to tune things just to my liking.

First, I make a custom style file with two templates—changeset and changeset_verbose:

changeset_common = '{label(ifcontains(rev, revset('parents()'),
      "log.activechangeset",
      "log.changeset"),
      "commit {rev}:{node}")}\
      {label("log.phase_{phase}",
	ifeq(phase, "public",
	  "",
	  " {ifeq(phase,"draft","D","S")}"))}\
      {label("log.branch", ifeq(branch, "default", "", " {branch}"))}\
      {label("log.tag", if(tags, " {tags}"))}\
      {bookmarks % "{ifeq(bookmark, currentbookmark,
	label('log.activebookmark', " {bookmark}"),
	label('log.bookmark', " {bookmark}"))}"}
    {ifeq(parents,"","","{ifeq(p2rev,-1,"Parent: ","Merge: ")}{parents}\n")}\
    Author: {author}
    Date:   {date(date,"%c %z")}\n
    {indent(desc,"    ")}\n'
changeset_files = '{ifeq(files, "", "", "\n {join(files,\"\n \")}\n")}'

changeset_verbose = '{changeset_common}{changeset_files}\n'
changeset = '{changeset_common}\n'

Normally, changeset is used by hg log and other revision set printing commands, while changeset_verbose is used when you provide them with the -v switch. In my template, the only difference between the two is that the verbose version prints the list of files touched by the commit.

Second, in my .hgrc, I define the colors I want to use for the various bits of info:

[color]
log.activebookmark = magenta bold
log.activechangeset = yellow bold
log.bookmark = magenta
log.branch = cyan
log.changeset = yellow
log.phase_draft = red bold
log.phase_secret = blue bold
log.tag = green

Finally, in my .hgrc, I set the default style to point to my style file:

[ui]
style = $HOME/environ/hg/style

That’s all there is to it! Feel free to take the above snippets and tailor them to your liking.

hg log -v vs. hg log –stat sidenote

My first version of the template did not support the verbose mode. I didn’t think this was a big deal, and I simply used hg log –stat instead. This provides the list of files touched by the commit and a visual indication how much they changed. For example, here’s a close up of two commits in the same test repo:

hg log -G –stat

Then one day, I tried to do that on a larger repo with a cold cache. It was very slow. It made sense why—not only did Mercurial need to list all the commits, it also needed to produce the diff of each commit only to do some basic counting for the diffstat.

My solution to the problem was to make verbose mode list all the files touched by the commit by using {files}. This is rather cheap since it requires consulting the manifest instead of calculating the diff for each commit. For example, here are the same two commits as above but in verbose mode:

hg log -G -v

It certainly has less detail, but it is good enough when you want to search the log output for a specific file name.

Modern Mercurial

I’ve been using both Git and Mercurial since they were first released in 2005. I’ve messed with the internals of both, but I always had a preference for Mercurial (its user interface is cleaner, its design is well thought-out, and so on). So, it should be no surprise that I felt a bit sad every time I heard that some project chose Git over Mercurial (or worse yet, migrated from Mercurial to Git). At the same time, I could see Git improving release after release—but Mercurial did not seem to. Seem is the operative word here.

A couple of weeks ago, I realized that more and more of my own repositories have been Git based. Not for any particular reason other than that I happened to type git init instead of hg init. After some reflection, I decided that I should convert a number of these repositories from Git to Mercurial. The conversion itself was painless thanks to the most excellent hggit extension that lets you clone, pull, and push Git repositories with Mercurial. (I just cloned the Git repository with a hg clone and then cleaned up some of the mess manually—for example, I don’t need the bookmark corresponding to the one and only branch in the original Git repository.) Then the real fun began.

I resumed the work on my various projects, but now with the brand-new Mercurial repositories. Soon after I started hitting various quirks with the Mercurial UI. I realized that the workflow I was using wasn’t really aligned with the UI. Undeterred, I looked for solutions. I enabled the pager extension, the color extension, overrode some of the default colors to be less offensive (and easier to read), enabled the shelve, rebase, and histedit extensions to (along with mq) let me do some minor history rewriting while I iteratively work on changes. (I learned about and switched to the evolve extension soon after.) With each tweak, the user experience got better and better.

Then it suddenly hit me—before these tweaks, I had been using Mercurial like it’s still 2005!

I think this is a very important observation. Mercurial didn’t seem to be improving because none of the user-visible changes were forced onto the users. Git, on the other hand, started with a dreadful UI so it made sense to enable new features by default to lessen the pain.

One could say that Mercurial took the Unix approach—simple and not exactly friendly by default, but incredibly powerful if you dig in a little. (This extensibility is why Facebook chose Mercurial over Git as a Subversion replacement.)

Now I wonder if some of the projects chose Git over Mercurial at least partially because by default Mercurial has been a bit…spartan.

With my .hgrc changes, I get exactly the information I want in a format that’s even better than what Git provided me. (Mercurial makes so much possible via its templating engine and the revsets language.)

So, what does all this mean for Mercurial? It’s hard to say, but I’m happy to report that there is a number of good improvements that should land in the upcoming 4.2 release scheduled for early May. For example, the pager and color functionality is moving into the core and they will be on by default.

Finally, I like my current Mercurial environment quite a lot. The hggit extension is making me seriously consider using Mercurial when dealing with Git repositories that I can’t convert.

Making Version Control Systems Go Boom

So, time has come, once again, to talk of many things…of Git and Mercurial. :)

For a fun project which I’ll describe here some other time, I want to version about 2GB of files. Here’s the breakdown:

  • 5x 312MB
  • 3x 100MB
  • 2x 16MB
  • 80 other files all under 5MB each

My first instinct was to use Mercurial, and so I did. It made sense, because it stores compressed deltas for the files. I don’t expect more than ~20MB to change between two consecutive versions, so it made sense on an architectural level as well.

The setup

There are a number of computers involved, unless I say otherwise, I’m talking about my laptop.

  • laptop: 3.06GHz P4, 1GB RAM
  • server: Athlon 2000, 1.25GB RAM
  • kernel devel box: 2x 2.8GHz Xeon, 2GB RAM, 4GB swap
  • big box: 4x 1.8GHz Opteron, 64GB RAM

Unfortunately, I can’t use the “big box” much. :( Oh well.

Attempt #1: Mercurial

First, I set up the directory hierarchy with all the files. Virtually all of the data in the 100MB & 312MB files consists of binary zeros, so it came as no surprise that the initial commit created approximatelly 50MB worth of history. Not bad at all! I ran some commands that changed the files the way I wanted, and commited each time I felt it was a good place to checkpoint. Mercurial’s compressed delta way of storing history really worked well, only 4MB increase in history between the initial and the 6th commit.

At this point, I decided that I should make a clone on another computer — yeah, I use distributed version control systems for backups of individual projects. :) Now, this is where things went crazy. I initiated clone on my server, and after about two minutes, the hg process on my laptop died with a memory allocation error. That sucks. It was probably because of the protocol, which tries to uncompress everything, and recompress it to save bandwidth. Since I was on a LAN, I tried to use the –uncompressed option, which doesn’t try to be smart, and just wastes bandwidth, but I forgot that I need to enable it on the server side, and so unknown to me, it still tried to compress the data. It died with a memory error, just as before. Oh well. At this point, I decided to try Git for this project.

Attempt #2: Git

Git uses a different storage scheme, well it actually has two. Whenever you commit, git stores the full file versions — compressed. I did a quick conversion of the hg repo to git — by hand as there were only 6 commits. I had to use:

hg update -C <rev>

otherwise, hg was trying to be too smart — something that makes you run out of memory. :)

After the conversion, the resulting .git repo was also about 50MB in size. Everything worked just as well. It is possible that the commits took little bit less time, as commiting consists of just compressing the files, and storing them on disk. I am not sure which one was faster, and knowing how each works doesn’t help with psychological effects :)

Anyway, it was time for me to clone the repository — again, going from my laptop to the server. I was afraid of this step, because when git transfers data between repositories, it tries to conserve bandwidth by making a packfile — a file containing a number of deltified objects (such as the compressed files stored during commit). It started to create the packfile, but it died with a nice message saying that it ran out of memory. Great! Now what? At that point, I decided to cheat. Since I need a packfile sooner or later, I just rsync’d the whole git repo to the kernel test box I have — a box that has twice the ram, and 4GB of swap, and I tried to clone from that. It got to about 66% done, when it was using most of the ram, and far too much swap. After about an hour and twenty minutes, I decided to rsync the repo to the box that has 64GB ram. On it, I ran the commands necessary to just create a pack file — without pulling/pushing/cloning. In about 10 minutes, it was done. Great! I then aborted the clone that was running for hour and a half, and cloned from the repo that had the packfile all set up. Everything worked rather nicely :) I moved things back onto my laptop.

Additional commits

Now it was time to resume what I was doing before — “the project”…I made some additional changes to the files, and made another commit. And it was time to push the changes. Git wasn’t happy. I wasn’t going to fight as I was getting tired, so I just rsync’d the # 6 newly created objects to the server.

Recently, there have been some patches on the git mailing list to make git little smarter about the way it uses multiple pack files. This doesn’t apply to me - at least not yet.

Conclusions

So, here it is. Both of the version control systems I like to use (each one has it’s area where I wouldn’t want to switch to the other), die on me because my 3 year old laptop has only 1GB of RAM. Just great. :-/ And please, don’t tell me about Subversion, and other non-distributed vcs tools. As far as I know, the other distributed systems consume even more resources.

Do I have a thing for Version Control Systems?

So, for whatever reason, I seem to be working on version control systems far too much. I have a decent amount of code in Mercurial, I wrote a bunch of wrappers for CVS, I call them CDS which stands for Completely Dumb System which is an apt description of CVS. And now I am working on gq (git repo: git://git.kernel.org/pub/scm/linux/kernel/git/jsipek/gq.git) which is a porcelain (set of wrapper scripts for git) that gives a Mercurial Queues-like functionality to git users.

Yep, I think it is official, I have a thing for version control systems. Ever since I became very interested in them (~April 2005), I learned a lot about them, and I am kind of tempted to give it a go and try something of my own. :)

Mercurial 0.9.2

So yesterday, Matt Mackall released Mercurial version 0.9.2 which includes the churn extension..my own creation! Mwhahaha! :)

OLS 2006 - Day 5

The day began with an awesome presentation I gave about Unionfs. :) Shawn was recoding it, but after the presentation, he found out that the video turned out to be crap. He has audio only. I’m sure he’ll share it soon. :) I was pleasantly surprised at the number of people that use Unionfs or were interested in Unionfs.

The keynote was excelent as always. However I must say that Greg K-H made it sound like any piece of code will get into the kernel. Yeah, right :) But he did say few nice things about the status of Linux.

After the keynote, there was the GPG key signing - which I did not attend, although I wanted to. Instead we went to get some food. Food was good, we (I, Dave, Mike Halcrow, and Prof. Zadok) talked about a bunch of things ranging from MythTV and terabyte storage servers, to things like the number of ants in Texas. (Apparently, it is a lot of fun to watch termites and fire ants battle to the death. O_o )

We finished food around 19:45 which was about right to head over to the Black Thorn for the after event party. Just as last year it was quite interesting. Pretty much as soon as I got there, I noticed Peter Baudis aka. pasky - the cogito maintainer. We chatted about how git and Mercurial differ (Matt’s talk the day before came in handy :) ). I mentioned I was slowly working on a generic benchmark script that would test a number of popular SCMs including Mercurial, Subversion, and CVS. He was thrilled about the prospect of knowing exactly where git sucked compared to other SCMs - my guess is that he wants to fix it and make it better, a noble goal, but unnecessary as Mercurial already exists and why reinvent the wheel? ;) Seriously, though, I think a lot of people would benefit from knowing exactly where each SCM excels, and where each sucks. The nice thing about collaborating with the git people would be that it would make it more apparent that this wouldn’t just be yet-another-fake-test. After some time, a bunch of other Czech people poped up right next to us (people like, Pavel Machek, etc.). It was quite interesting. :)

After than I joined a converation with some Intel people. As it turns out, one of the Intel people is working on the e1000 driver — awesome piece of hardware, by the way, don’t ever buy anything other than it. :) Some time later, Jens Axboe joined the group briefly. When he said my name seemed familiar, I mentioned how I tried to implement IO priorities - and failed :) Later on, a guy from University of Toronto joined the group. He approached me earlier in the day about unionfs on clusters. We chatted about things ranging from school (undergraduate program, and grad school) to submitting kernel code to lkml. The e1000 guy said a similar thing that we should split unionfs up into a few patches, and send it off. During the event a few people still asked me about Unionfs, which felt good :)

Then, I decided that it would be fun to talk to some IRC people. I found John Levon and Seth Arnold. We sat down, and had an interesting conversation about a number of things. Since at least some of these were quite interesting, here’s a list:

  1. How can I deal with VFS and not drink vodka or other hard liquer
  2. Everybody hates CDE, even people at Sun
  3. Solaris is dead (well, they didn’t say it, but that’s the feeling I got)
  4. Brittons have some interesting sports or at least some of the expected behavior during the sport is interesting, namely:

  1. darts - you are expected to drink as you play
  2. I can’t recall the name - gigantic pool table
  3. cricket - everyone smokes "reefer" (to quote Movement, I just find this name of the substance mildly amusing) because their games sometimes take several days

After that, they kicked everyone out as it was 2:45 already. We (Seth, John, and I) went back to the hotel. There, Prof. Zadok and Chip (who arrived on Friday) were about to get up and head to the airport. :) I just went to bed.

OLS 2006 - Day 4

Friday was kind of interesting. The talks were little weaker, but there were some interesting ones. For example, Matt Mackall’s Towards a Better SCM: Revlog and Mercurial talk was a nice way to learn how Mercurial stores the history. I got tired so I went back to the hotel, and fell asleep for few hours. I woke up just in time to head over to the conference center for the 20:00 Stackable file systems BOF. That was interesting. A lot of useful people showed up, including Christoph Hellwig, Ted Tso, Val Henson, Steve French (Samba/CIFS guy), Jan Blunck,Eric Van Hensbergen (plan9 fs implementation in Linux) and many more. Topics included limited stackspace (4K on i386), cache consistency, locking, and nameidata structure brain damage.

As we planed before, after the BOF we invited everyone over to the hotel for some snacks and drinks. That’s where things got really interesting. I spent a lot of time with Jan Blunck and Eric Van Hensbergen talking about the proper way to do unions. Three people, three different ways to union files :)

After that we had some fun with the stack space issue and Reiserfs (and Hans’s approach to open source).

stack space
There should be a competition "who can create the largest storage stack without overflowing the stack." For example, unionfs on top of gzipfs on top of versionfs on top of device mapper on top of md on top of scsi over ethernet on top of ppp tunneled thought ssh …. you get the idea
Reiserfs
Apparently, Christoph once fixed something trivial in reiserfs code, he sent the patch to Hans, and in return he got a gigantic legal document (I know exactly what he is talking about as I have submitted a patch once as well). Well, he didn’t like it, so he gave Hans a bunch of options one of which included certain sum of money for copyright to the code. Interestingly enough, Hans accepted. Too bad I didn’t know about this story before, I might have made some money of off my one line fix to the reiserfs section in the Kconfig file. :)

I have to say, I really want to look at how plan9 works thanks to the conversation I had with Eric Van Hensbergen. It seems to have a lot of really good ideas. Hrm, now that I think about it, I might install it one of my old boxes. Yeah, I know I am odd.

Powered by blahgd