Josef “Jeff” Sipek

Retiring Guilt

Filed under:

— JeffPC @ November 6, 2020 18:08

It took me about 3 years to write this post. Partly because I had other things I wanted to work on and partly because I hoped that it wouldn’t be needed. Well, I finally decided that I really need to write this.

In short, I’m officially stopping work on guilt.

Practically speaking, I haven’t touched it (as a developer) in over two years and as a user in about as long. So really, nothing will change.

What is guilt?

I started writing Guilt in fall 2006 because I was working on unionfs and needed to maintain patches on top of the Linux kernel git repository—much like what the mq extension did with Mercurial repositories.

It all started with:

commit 664e5a7d7f8d2c2726f03a239de11fa00127cf84
Author: Josef Sipek <jsipek@thor.fsl.cs.sunysb.edu>
Date:   Mon Nov 6 13:08:30 2006 -0500

    Initial commit

That’s right, 14 years to the day.

Technically, the first few versions were called “gq” (which stood for “git quilt”) until someone pointed out that “GQ” was a well established GTK-based LDAP client.

Artifacts

If anyone wishes to resurrect this project, then by all means go for it. If not, the old content will remain online for as long as I have a web server. :)

Specifically, you can find everything up to and including the last release (v0.37-rc1) at the following locations:

HTML manpages: http://guilt.31bits.net/man/
Source tarballs: http://guilt.31bits.net/src/
Git repository: https://repo.or.cz/guilt.git (mirrored at https://github.com/jeffpc/guilt)

Users

I know that Guilt has served a number of people quite well over the years. It’s been quite stable and mostly feature complete since at least 2008, so I haven’t really been hearing from people short of the occasional patch or an occasional “oh yeah, I use that”.

To those users: I hope the last release works well enough for you until someone starts to maintain Guilt again or you find a different tool that suits your needs.

Comments (1)

My Logbooks

Filed under:

— JeffPC @ August 1, 2018 17:15

Recently a friend asked me about what I use for my pilot logbook. This made me realize that my logging is complicated and that I should probably make a blahg entry about it.

All in all, I have three logbooks to keep track of my flying.

Good ol’ paper logbook

This is the “official” one. In other words, if the FAA wants to see my logbook, that’s what I’ll show them. There’s not much more to say about it.

ForeFlight

This is my casual logbook. A while back I entered everything in, including more accurate counts (full stop vs. touch+go) and better divided up time counts (PIC vs. solo). I use this logbook to answer questions like “How much time do I have?” and “Am I current?”. It is also useful when chatting with people and I want to dig up an entry.

I also use it to keep track of Wikipedia article: Hobbs vs. tach time since I pay based on tach time.

A Repository

This is my custom analysis and archive logbook. In this Mercurial repository, I keep a giant JSON file with every entry with even more detail than what’s in ForeFlight.

Alongside it, I also keep a list of latitude/longitude/altitude information for each airport I’ve been to.

From these two files, I can generate various plots. For example, here is one:

Airports as of 2018-07-30

This is a plot of all the airports I’ve ever landed at—color coded based on the year of my first landing there.

This repository also serves as a backup of my paper logbook (in case my physical logbook burns up, gets water damaged, etc.) and an archive of other flying related data. I accomplish this by keeping scans of the paper logbook, copies of any GPS tracklogs I’ve recorded, and so on in a couple of subdirectories.

Post-flight

At the end of each flight, I add an entry to my ForeFlight logbook. Usually, I have ForeFlight recording a tracklog, so a large part of the entry is auto-generated. The bits of info that I add manually are:

tach time (useful for billing)
time out/off/on/in (I’m trying to figure out how much time I “waste” on the ground to improve my planning accuracy)
landing counts
any remarks I wouldn’t remember later

Then, when I’m home and have time (this can be later that day, or 3 days later), I pull up the ForeFlight entry, improve/edit the remarks, double check that all the counts make sense (if needed I pull up the tracklog to recount the number of landings, etc.), and then write an entry into my paper logbook.

If I filled up a page of the paper logbook, I scan the two pages and drop them into the repository.

Depending on how I feel, I may update the repository logbook JSON file then and there or at some point later (in the past I’ve even waited for a month due to laziness). Usually, visiting a new airport is motivating enough.

Comments (0)

2018-06-05

Filed under:

— JeffPC @ June 5, 2018 15:17

Smart Clock: A New Time — Using three inexpensive wrist watches to achieve 1 second accuracy over an extended period of time.

Repairing the card reader for a 1960s mainframe: cams, relays and a clutch

The 555 Timer IC an Interview with Hans Camenzind—The Designer of the Most Successful Integrated Circuit Ever Developed

High-level Problems with Git and How to Fix Them — A Mercurial developer’s view of Git’s shortcomings.

Mailing lists vs Github

GDL 90 Data Interface Specification — Definition of the serial protocol used by Wikipedia article: UAT receivers to feed the received data to MFDs.

GDL 90 Extended Specification — ForeFlight’s extension to GDL 90.

Comments (0)

Modern Mercurial - Phases

Filed under:

— JeffPC @ April 15, 2018 14:51

This post is part of a series named “Modern Mercurial” where I share my realizations about how much Mercurial has advanced since 2005 without me noticing.

Last year, I had a realization that I haven’t been using Mercurial to its full potential. In this post, I’d like to share my thoughts about and usage of Mercurial Phases.

Phases are not a new feature. They made their first appearance back in 2012 as part of Mercurial 2.1, which makes them a little over 6 years old.

What are phases?

While there is a description of phases on the Mercurial wiki, I’ll take a stab at a short intro.

Each commit belongs to one of three phases (public, draft, or secret) which implies a set of allowed operations on the commit. Furthermore, the phase dictates which other phase or phases the commit can transition to.

You can think of the phases as totally ordered (secret → draft → public) and a commit’s phase can only move in that direction. That is, a secret commit can become either a draft or a public commit, a draft commit can become a public commit, and a public commit is “stuck” being public. (Of course if you really want to, Mercurial allows you to force a commit to any phase via hg phase -f.)

The allowed operations on a commit of a particular phase are pretty self-explanatory:

Public commits are deemed immutable and sharable—meaning that if you try to perform an operation on a commit that would modify it (e.g., hg commit –amend), Mercurial will error out. All read-only operations as well as pushing and pulling are allowed.

Secret commits are mutable and not sharable—meaning that all modifications are allowed, but the commits are not pullable or pushable. In other words, a hg pull will not see secret commits in the remote repository, and a hg push will not push secret commits to the remote repository.

Draft commits are mutable and sharable—a phase between public and secret. Like secret commits, changes to commits are allowed, and like public commits, pushing and pulling is allowed.

Or in tabular form:

Phase	Commits	Sharing
public	immutable	allowed
draft	mutable	allowed
secret	mutable	prevented

By default, all new commits are automatically marked as draft, and when a draft commit is pushed it becomes public on both ends.

Note that these descriptions ignore the amazing changeset evolution features making their way into current Mercurial since they can blur the “not yet shared” nature of draft commits. (Perhaps I should have titled this post Modern Mercurial (2012 edition) — Phases.)

A note about hg log

Unfortunately, the default hg log output does not display phases at all. I think this is rather unfortunate (but understandable from a backwards compatibility point of view).

Last year, I dedicated a whole post to how I template hg log information including my reasoning for why I display phases the way I do.

How do I use phases?

Now that we have the basic introduction to phases out of the way, let me describe how I mapped them to my workflow.

First of all, I make all new commits start in the secret phase (instead of the default draft) with a quick addition to .hgrc:

[phases]
new-commit = secret

This immediately prevents an accidental hg push from pushing commits that I’m still working on. (Recall that secret commits cannot be pushed.) In at least one repository, this allowed me to regularly have more than 6 heads with various work-in-progress feature ideas without the fear of accidentally messing up a public repository. Before I started using phases, I used separate clones to get similar (but not as thorough) protections.

Now, I work on a commit for a while (keeping it in the secret phase), and when I feel like I’m done, I transition it to the draft phase (via hg phase -d). At that point, I’m basically telling Mercurial (and myself when I later look at hg log) that I’m happy enough with the commit to push it.

Depending on what I’m working on, I may or may not push it immediately after (which would transition the commit to the public phase). Usually, I hold off pushing the commit if it is part of a series, but I haven’t done the last-chance sanity checks of the other commits.

Note: I like to run hg push without specifying a revision to push. I find this natural (and less to type). If I always specified a revision, then phases wouldn’t help me as much.

“Ugly” repos

I have a couple of repositories that I use for managing assorted data like my car’s gasoline utilization. In these repositories, the commits are simple data point additions to a CSV file and the commit messages are repetitive one-liners. (These one-liners create a rather “ugly” commit history.)

In essence, the workflow these repositories see can be summarized as:

$ echo "2018-04-05,12345,17.231," >> data.csv
$ hg commit -m "more gas"
$ hg push

In these repositories, I’ve found that defaulting to the secret phase was rather annoying because every commit was immediately followed by a phase change to allow the push to work. So, for these repos I changed new-commit back to draft.

Edit: I reworded the sentence about Mercurial giving you a way to force a commit to any phase based on feedback on lobste.rs.

Comments (0)

Modern Mercurial - hg log

Filed under:

— JeffPC @ July 29, 2017 13:34

This post is part of a series named “Modern Mercurial” where I share my realizations about how much Mercurial has advanced since 2005 without me noticing.

As I pointed out recently, I ended up customizing my .hgrc to better suit my needs. In this post, I’m going to talk about my changes to tailor the hg log output to my liking.

There are three issues I have with the default hg log format:

By default, only the first line of the commit message is shown. To see it fully, you need to use verbose mode.
In verbose mode, the touched files are listed as well without a way to hide them.
In verbose mode, the listed files are not listed one per line, but rather as a single line.

If, like me, you prefer the Linux-kernel style commit messages, you likely want to see the whole message when you look at the log (problem #1). Here is, for example, a screenshot of a commit using the default style (normal and verbose mode):

hg log

You can work around not seeing the whole commit message by always using the verbose mode, but that means that you’ll also be assaulted by the list of changed files (problem #2) without a way to hide it. To make the second problem even worse, the file names are listed on a single line, so all but the most trivial of changes create an impossible to read blob of file names (problem #3). For example, even with only a handful of files touched by a commit:

hg log -v

At least, those are my problems with the default format. I’m sure some people like the default just the way it is. Thankfully, Mercurial is sporting a powerful templating engine, so I can override the style whichever way I want.

Demo

Ok, before I dive into the rather simple config file changes, let’s take a look at a screenshot of the result on a test repository:

hg log -G

As you can see, the format of each log entry is similar to that of git log (note that the whole multi-line commit message is displayed, see revision 1), but with extra information. What exactly does it all mean? I think the best way to explain all the various bits of information is to show you an annotated version of the same screenshot:

hg log -G

I’m now going to describe the reasons why the various bits of information are presented the just way they are. If you aren’t interested in this description, skip ahead to the next section where I present the actual configuration changes I made.

Each commit hash (in yellow) is followed by a number of “items” that tell you more about the commit.

First is the phase. The phase name is abbreviated to a single letter (or no letter for the public phase) and color coded. It is the first item because every commit has a phase, the phase is an important bit of information, and the “encoded” phase info is very compact.

The reasoning behind the phase letters and colors is as follows:

public phase (no letter): Public commits are not interesting since everyone has them, so don’t draw attention to them by omitting a letter.
secret phase (‘S’): The only interesting thing about secret commits is that they will not be pushed. That means that they cannot be accidentally pushed either. Since this behavior is “boring”, use dark blue to indicate that they are different from public commits, but do not draw too much attention to them.
draft phase (‘D’): These are the “dangerous” commits. Pushing them will change the remote repository’s state, so draw significantly more attention to these by using red.

I use letters instead of just using a different color for the commit hash for a very simple reason—if colors aren’t rendering properly, I still want to be able to tell the phases apart.

Second comes the named branch. When looking at several commits (e.g., hg log), most of the time any two adjacent commits will be on the same named branch. On top of that, each commit belongs to exactly one named branch. Therefore, even though the named branch name is not a fixed field, it behaves as one. In my experience, it is a good idea to display fixed fields before any variable length fields to make it easier for the eyes to spot any differences. (Yes, technically the way I display the phase information is not fixed width and therefore the named branch will not always start in the same column, but in practice adjacent commits tend to have the same phase as well, so the named branch will always be in a semi-fixed position.) Note that in Mercurial the “default” branch is usually rendered as the empty string, and I follow that convention with my template.

Third comes the list of tags. Each commit can have many tags. This is the first item on the line that can become unreasonably long. At least in the repositories that I deal with, there aren’t very many tags per commit, so I haven’t seen any bad effects.

Fourth and final comes the list of bookmarks. Much like tags, there can be many, but in practice there are very few. Since I deal with tags more often than bookmarks, I put the bookmark information after the tags. The active bookmark is rendered as bold.

The choice of colors for named branches (cyan), tags (green), and bookmarks (magenta) was guided by a simple principle: they should go well with the yellow color of the changeset line, and not draw too much attention but still be visually distinct. Sadly, on a terminal without color support, they will all render the same way. I think this is still workable, since repositories have conventions for branches/tags/bookmarks naming and therefore the user can still guess what type of name it is. (Worst case, the user can consult other hg commands to figure out what exactly is being displayed.)

The checked out commit and the active bookmark being rendered as bold without any additional indication that they are different is also unfortunate. I haven’t found a pleasant way to render this information that would convey the same information on dumb terminals. (Note that there is a class of terminals that support bold fonts but not different colors. Even those will render this info correctly.)

Config

So, how did I achieve this glorious output? It’s not too complicated, but it took me a while to tune things just to my liking.

First, I make a custom style file with two templates—changeset and changeset_verbose:

changeset_common = '{label(ifcontains(rev, revset('parents()'),
      "log.activechangeset",
      "log.changeset"),
      "commit {rev}:{node}")}\
      {label("log.phase_{phase}",
	ifeq(phase, "public",
	  "",
	  " {ifeq(phase,"draft","D","S")}"))}\
      {label("log.branch", ifeq(branch, "default", "", " {branch}"))}\
      {label("log.tag", if(tags, " {tags}"))}\
      {bookmarks % "{ifeq(bookmark, currentbookmark,
	label('log.activebookmark', " {bookmark}"),
	label('log.bookmark', " {bookmark}"))}"}
    {ifeq(parents,"","","{ifeq(p2rev,-1,"Parent: ","Merge: ")}{parents}\n")}\
    Author: {author}
    Date:   {date(date,"%c %z")}\n
    {indent(desc,"    ")}\n'
changeset_files = '{ifeq(files, "", "", "\n {join(files,\"\n \")}\n")}'

changeset_verbose = '{changeset_common}{changeset_files}\n'
changeset = '{changeset_common}\n'

Normally, changeset is used by hg log and other revision set printing commands, while changeset_verbose is used when you provide them with the -v switch. In my template, the only difference between the two is that the verbose version prints the list of files touched by the commit.

Second, in my .hgrc, I define the colors I want to use for the various bits of info:

[color]
log.activebookmark = magenta bold
log.activechangeset = yellow bold
log.bookmark = magenta
log.branch = cyan
log.changeset = yellow
log.phase_draft = red bold
log.phase_secret = blue bold
log.tag = green

Finally, in my .hgrc, I set the default style to point to my style file:

[ui]
style = $HOME/environ/hg/style

That’s all there is to it! Feel free to take the above snippets and tailor them to your liking.

hg log -v vs. hg log –stat sidenote

My first version of the template did not support the verbose mode. I didn’t think this was a big deal, and I simply used hg log –stat instead. This provides the list of files touched by the commit and a visual indication how much they changed. For example, here’s a close up of two commits in the same test repo:

hg log -G –stat

Then one day, I tried to do that on a larger repo with a cold cache. It was very slow. It made sense why—not only did Mercurial need to list all the commits, it also needed to produce the diff of each commit only to do some basic counting for the diffstat.

My solution to the problem was to make verbose mode list all the files touched by the commit by using {files}. This is rather cheap since it requires consulting the manifest instead of calculating the diff for each commit. For example, here are the same two commits as above but in verbose mode:

hg log -G -v

It certainly has less detail, but it is good enough when you want to search the log output for a specific file name.

Comments (0)

Modern Mercurial

Filed under:

— JeffPC @ April 29, 2017 02:50

This post is part of a series named “Modern Mercurial” where I share my realizations about how much Mercurial has advanced since 2005 without me noticing.

I’ve been using both Git and Mercurial since they were first released in 2005. I’ve messed with the internals of both, but I always had a preference for Mercurial (its user interface is cleaner, its design is well thought-out, and so on). So, it should be no surprise that I felt a bit sad every time I heard that some project chose Git over Mercurial (or worse yet, migrated from Mercurial to Git). At the same time, I could see Git improving release after release—but Mercurial did not seem to. Seem is the operative word here.

A couple of weeks ago, I realized that more and more of my own repositories have been Git based. Not for any particular reason other than that I happened to type git init instead of hg init. After some reflection, I decided that I should convert a number of these repositories from Git to Mercurial. The conversion itself was painless thanks to the most excellent hggit extension that lets you clone, pull, and push Git repositories with Mercurial. (I just cloned the Git repository with a hg clone and then cleaned up some of the mess manually—for example, I don’t need the bookmark corresponding to the one and only branch in the original Git repository.) Then the real fun began.

I resumed the work on my various projects, but now with the brand-new Mercurial repositories. Soon after I started hitting various quirks with the Mercurial UI. I realized that the workflow I was using wasn’t really aligned with the UI. Undeterred, I looked for solutions. I enabled the pager extension, the color extension, overrode some of the default colors to be less offensive (and easier to read), enabled the shelve, rebase, and histedit extensions to (along with mq) let me do some minor history rewriting while I iteratively work on changes. (I learned about and switched to the evolve extension soon after.) With each tweak, the user experience got better and better.

Then it suddenly hit me—before these tweaks, I had been using Mercurial like it’s still 2005!

I think this is a very important observation. Mercurial didn’t seem to be improving because none of the user-visible changes were forced onto the users. Git, on the other hand, started with a dreadful UI so it made sense to enable new features by default to lessen the pain.

One could say that Mercurial took the Unix approach—simple and not exactly friendly by default, but incredibly powerful if you dig in a little. (This extensibility is why Facebook chose Mercurial over Git as a Subversion replacement.)

Now I wonder if some of the projects chose Git over Mercurial at least partially because by default Mercurial has been a bit…spartan.

With my .hgrc changes, I get exactly the information I want in a format that’s even better than what Git provided me. (Mercurial makes so much possible via its templating engine and the revsets language.)

So, what does all this mean for Mercurial? It’s hard to say, but I’m happy to report that there is a number of good improvements that should land in the upcoming 4.2 release scheduled for early May. For example, the pager and color functionality is moving into the core and they will be on by default.

Finally, I like my current Mercurial environment quite a lot. The hggit extension is making me seriously consider using Mercurial when dealing with Git repositories that I can’t convert.

Comments (0)

git filter-branch

Filed under:

— JeffPC @ September 22, 2015 02:31

Recently, I had to rewrite some commits in a git repository. All I wanted to do was set the author and committer names and emails to the correct value for all the commits in a repository. (Have you ever accidentally committed with user@some.host.local as the email address? I have.) It turns out that git has a handy command for that: git filter-branch. Unfortunately, using it is a bit challenging. Here’s what I ended up doing. (In case it isn’t clear, I am documenting what I have done in case I ever need to do it again on another repository.)

The invocation is relatively easy. We want to pass each commit to a script that creates a new commit with the proper name and email. This is done via the –commit-filter argument. Further, we want to rewrite each tag to point to the new commit hash. This is done via the –tag-filter argument. Since we’re not trying to change the contents of the tag, we use cat to simply pass through the tag contents.

$ git filter-branch \
        --commit-filter '/home/jeffpc/src/poc-clean/process.sh "$@"' \
        --tag-name-filter cat \
        -- fmt4 load-all master
Rewrite a95e3603e5ec40e6f229e75425f1969f13c17820 (710/710)
Ref 'refs/heads/fmt4' was rewritten
Ref 'refs/heads/load-all' was rewritten
Ref 'refs/heads/master' was rewritten
v3.0 -> v3.0 (b56481e52236c8bd85e647c30bafad6ac651e3fb -> b53c5b3ae8e18de02e1067bada7a0f05d4bcd230)
v3.1 -> v3.1 (993683bf104f42a74a2c58f2a91aee561573f7cc -> 1a1f4ff657abc8e97879f68a5dc4add664980b71)
v3.2 -> v3.2 (090b3ff1a66fa82d7d8fc99976c42c9495d5a32f -> 60fbeb91b689c65217b5ea17e68983d6aebc0239)
v3.3 -> v3.3 (4fb6d3ac2c5b88e69129cefe92d08decb341e1ae -> dd75fbb92353021c2738da2848111b78d1684405)

Caution: git filter-branch changes the directory while it does all the work so don’t try to use relative paths to specify the script.

The commit filter script is rather simple:

#!/bin/sh

name="Josef 'Jeff' Sipek"
email="jeffpc@josefsipek.net"

export GIT_AUTHOR_NAME="$name"
export GIT_AUTHOR_EMAIL="$email"
export GIT_COMMITTER_NAME="$name"
export GIT_COMMITTER_EMAIL="$email"

exec git commit-tree "$@"

It just sets the right environmental variables to pass the right name and email to git commit-tree, which writes out the commit object.

That’s it! I hope this helps.

Comments (0)

Making Version Control Systems Really Go Boom

Filed under:

— JeffPC @ June 20, 2007 03:43

This is a part 2 of my adventures of making version systems go boom.

As I described before, I need to version some reasonably large files. After trying Mercurial and Git, I decided to go with git as it presented me with fewer problems.

To make matters worse than before, I now need to version 3 files which are about 2.7GB in size each. I tried to git-add the directory, but I got this wonderful message:

$ git-add dir/
The following paths are ignored by one of your .gitignore files:
dir/ (directory)
Use -f if you really want to add them.
$ git-add -f dir/
fatal: dir/: can only add regular files or symbolic links

Wha?

I don’t have any .gitignore files in this repository
Adding a directory like that worked (and still works!) on other directories

Really painful. Time to experiment, but first I run git-status to see what other files I have not committed yet, and I see everything listed except the directory!…So, I moved one of the files to the top directory of the repo, ran git-status — the file did not show up — but tried to add it anyway:

$ git-add file
fatal: pathspec 'file' did not match any files

Ok, this time around, I at least get an error message which I’ve seen before. It is still wrong, but oh well. Thankfully, the program that uses these files has be made in such a way that it can handle filesystems which don’t support files larger than 2GB. I regenerate the file, now I have 2 files, the first one 2GB and the other 667MB. git-status displays both — great! git-add on the smaller file works flawlessly, but…you guessed it! Adding the larger file dies? Which error message?

fatal: Out of memory, malloc failed

Yep, great. My laptop’s 1GB of RAM just isn’t good enough, eh? I’m not quite sure what I’ll do, I’ll probably scp everything over to a box with 2+GB RAM, and commit things there. This really sucks :-/

Update: I asked around on IRC (#git) where I got a few pointers and the code confirms things…it would seem that git-hash-object tries to mmap the entire file. This explains the out of memory error. The other problem is the fact that the file size is stored in an unsigned long variable, which is 32-bits on my laptop. Oh well, so much for files over 4GB. I think, but I’m not sure - I’m too lazy to check — the stat structure may return a signed int which would limit things to 2GB — which is what I see.

Comments (3)

Making Version Control Systems Go Boom

Filed under:

— JeffPC @ May 30, 2007 21:57

So, time has come, once again, to talk of many things…of Git and Mercurial. :)

For a fun project which I’ll describe here some other time, I want to version about 2GB of files. Here’s the breakdown:

5x 312MB
3x 100MB
2x 16MB
80 other files all under 5MB each

My first instinct was to use Mercurial, and so I did. It made sense, because it stores compressed deltas for the files. I don’t expect more than ~20MB to change between two consecutive versions, so it made sense on an architectural level as well.

The setup

There are a number of computers involved, unless I say otherwise, I’m talking about my laptop.

laptop: 3.06GHz P4, 1GB RAM
server: Athlon 2000, 1.25GB RAM
kernel devel box: 2x 2.8GHz Xeon, 2GB RAM, 4GB swap
big box: 4x 1.8GHz Opteron, 64GB RAM

Unfortunately, I can’t use the “big box” much. :( Oh well.

Attempt #1: Mercurial

First, I set up the directory hierarchy with all the files. Virtually all of the data in the 100MB & 312MB files consists of binary zeros, so it came as no surprise that the initial commit created approximatelly 50MB worth of history. Not bad at all! I ran some commands that changed the files the way I wanted, and commited each time I felt it was a good place to checkpoint. Mercurial’s compressed delta way of storing history really worked well, only 4MB increase in history between the initial and the 6th commit.

At this point, I decided that I should make a clone on another computer — yeah, I use distributed version control systems for backups of individual projects. :) Now, this is where things went crazy. I initiated clone on my server, and after about two minutes, the hg process on my laptop died with a memory allocation error. That sucks. It was probably because of the protocol, which tries to uncompress everything, and recompress it to save bandwidth. Since I was on a LAN, I tried to use the –uncompressed option, which doesn’t try to be smart, and just wastes bandwidth, but I forgot that I need to enable it on the server side, and so unknown to me, it still tried to compress the data. It died with a memory error, just as before. Oh well. At this point, I decided to try Git for this project.

Attempt #2: Git

Git uses a different storage scheme, well it actually has two. Whenever you commit, git stores the full file versions — compressed. I did a quick conversion of the hg repo to git — by hand as there were only 6 commits. I had to use:

hg update -C <rev>

otherwise, hg was trying to be too smart — something that makes you run out of memory. :)

After the conversion, the resulting .git repo was also about 50MB in size. Everything worked just as well. It is possible that the commits took little bit less time, as commiting consists of just compressing the files, and storing them on disk. I am not sure which one was faster, and knowing how each works doesn’t help with psychological effects :)

Anyway, it was time for me to clone the repository — again, going from my laptop to the server. I was afraid of this step, because when git transfers data between repositories, it tries to conserve bandwidth by making a packfile — a file containing a number of deltified objects (such as the compressed files stored during commit). It started to create the packfile, but it died with a nice message saying that it ran out of memory. Great! Now what? At that point, I decided to cheat. Since I need a packfile sooner or later, I just rsync’d the whole git repo to the kernel test box I have — a box that has twice the ram, and 4GB of swap, and I tried to clone from that. It got to about 66% done, when it was using most of the ram, and far too much swap. After about an hour and twenty minutes, I decided to rsync the repo to the box that has 64GB ram. On it, I ran the commands necessary to just create a pack file — without pulling/pushing/cloning. In about 10 minutes, it was done. Great! I then aborted the clone that was running for hour and a half, and cloned from the repo that had the packfile all set up. Everything worked rather nicely :) I moved things back onto my laptop.

Additional commits

Now it was time to resume what I was doing before — “the project”…I made some additional changes to the files, and made another commit. And it was time to push the changes. Git wasn’t happy. I wasn’t going to fight as I was getting tired, so I just rsync’d the # 6 newly created objects to the server.

Recently, there have been some patches on the git mailing list to make git little smarter about the way it uses multiple pack files. This doesn’t apply to me - at least not yet.

Conclusions

So, here it is. Both of the version control systems I like to use (each one has it’s area where I wouldn’t want to switch to the other), die on me because my 3 year old laptop has only 1GB of RAM. Just great. :-/ And please, don’t tell me about Subversion, and other non-distributed vcs tools. As far as I know, the other distributed systems consume even more resources.

Comments (2)

Guilt: Taking over the world one repository at a time

Filed under:

— JeffPC @ May 13, 2007 17:31

It is really interesting how sometimes a bit of luck makes things happen. For example, little over 6 months ago, I wrote a few shell scripts, which called gq, to make my life a little easier. I worked on the for about a week, and then I decided I should share with the community. So I tagged the sources as version 0.10, and announced it on the git mailing list. One of the comments I got was about the fact that there is another project (completely unrelated) that had the name gq for a long time. Oh well, it was time for me to rename it. After some procrastination and hacking, new year rolled around, and I decided to release 6th version (v0.15), but this time it wouldn’t be gq anymore — instead I would call it Guilt. My post from January describes how it got the name. As with every version of gq, I announced Guilt v0.15. I could see that Guilt was getting way better, and so I felt even more motivated to hack on it. v0.16 came out. And then a very unexpected thing happened. I got two patches from a guy on the mailing list. Sweet! I applied them, and release v0.17. Shortly thereafter, during the Linux Storage and Filesystem (LSF) workshop in San Jose, I got a patch from Ted Ts’o (of the ext[234] fame). I couldn’t believe it, but it was true. I decided to release v0.19 the next day. At LSF, I met Brandon Philips, and we talked about Guilt. Rather shortly after LSF, he send me an email saying that he’ll try to get Guilt into Debian. :) Well, about a month ago, he succeeded.

As many of you may already know, I stick around a number of channels on OFTC’s IRC network, and it is rather interesting to see people try Guilt, or people talk about Guilt; generally suggesting that someone use it — and people do!

Anyway, I hope I didn’t bore everyone to death with my little tour of history behind Guilt.

Comments (1)