Josef “Jeff” Sipek

Memory Leaks

Alright, I think I’ve had just about enough. Why does Amarok eat up 22% of my RAM (1GB) after 4 days of running (and playing music for maybe 18 hours of those 4 days)? Why does Firefox use up 33% of my RAM in 4 days?

Why is it that when I shut down the app, and restart it, the usage is 4–5 times less?

Amarok Firefox 2
before app restart 225 MB 338 MB
after app restart 58 MB 72 MB

The only reason I can think of is application being buggy, or having really crappy defaults.

Buggy Applications

Dear developers, believe it or not, when you allocate memory, you also have to free it when you are done with it. If you don’t, you are committing a crime against humanity known as a “memory leak”. This memory is unusable, and essentially becomes dead weight the process carries around. Since it is not used, the OS may swap it out, and before long, your swap file/partition becomes full of memory that has been leaked.

Contrary to popular belief, freeing memory is really simple.

For you C++ coders (yes, that includes you Amarok folks), you simply use the delete keyword followed by a pointer of what you want to free. For example,

delete some_pointer;

If you are using C, the free function is your friend. Just call it, and make the one argument you give it the pointer to what you want to free. For example,

free(some_pointer);

Now, if you are working on a larger project, there might be wrappers around the memory management (malloc/free, new/delete) functions, but whatever the “free this memory” function is called, USE IT.

I can almost hear all the managed languages fans yell: “Just use a language that does garbage collection, and you won’t have to worry about freeing memory.” Well, you are WRONG!

Garbage collectors maintain graphs of memory allocations, and whenever they notice that some piece of memory is unreachable, they mark it as garbage, and free it. Here’s my favorite example for causing leaks in a garbage collected language:

Suppose that you have implemented a class that works as a stack. You implemented it as a list of elements, and an index into the array to mark the top of the stack. Pushing an element is trivial, you just increment the index, and set the reference in the array to the object you want to store. Popping is really easy, you just decrement the index, and you’re done. Right? WRONG! Decrementing the index changes that one integer variable, but that reference in the array is still valid, and therefore the object is still reachable as far as the garbage collector is concerned. Sure, next time you push into that slot, the previous reference will get broken, and the previous allocation will get freed (assuming that there are no other references). But what if you never push that many elements back onto the stack? What if you experienced some high-load spike? You’ll have a large number of objects incorrectly referenced, tieing up memory, and quite possibly making the entire system slower.

How can you solve this? Pretty simple, just reset the reference to some “null” quantity. In Java, that means using the null literal. For example,

some_reference = null;

In Python, None is the proper keyword to use:

some_reference = None

The lesson is, free the memory you allocated when you are done using it.

Crappy Defaults

Many large applications (Firefox included), have many options you can set that affect its behavior. The default options should cover 95% or more of the users (or at least the greatest majority possible). Why such a high number? Well, suppose you settle for making 90% of your users happy out of the box…that means that 1 in 10 people that try your app will not be happy with the defaults. How many will bother checking if there even are knobs they can turn to make it work the way they want? Not all. Some will just try to install another open source app written by someone else that does pretty much the same thing. So, the default options should make as many people happy as possible.

How does this tie into a third of my RAM being used by Firefox? Simple, I do not know if there are any knobs that would “fix” the problem I am seeing. For all I know, someone decided that it was a great idea to be really aggressive about caching web page content in memory — something that’s fine if you have 16GB RAM, but guess what most people don’t.

Whatever it is (defaults that don’t make sense or memory leaks), Firefox and Amarok have problems that must get addressed. What is one of the reasons people complain about Microsoft Word? It takes up tons of memory. Well, I don’t feel like throwing over 200 MB of RAM at an application that plays MP3s, displays a playlist, and cover art.

And before someone suggests that I use Firefox 3… I realize that it is all super-duper-better-than-ever, but let’s think for a second. When the original Firefox was released, it was hailed as the non-leaky, light-weight Mozilla. Then, things started to get slow again. Firefox 2 was supposed to be the super-fast, non-leaky browser. What happened? What happened to my >300 MB of RAM? Now, Firefox 3 is all the rage…do you see the pattern yet?

I think this brings up a larger issue. It’s no secret that I do some Linux kernel coding from time to time. In the kernel, there are leaks at times, but it seems that the kernel leaks are effectively non-existent compared to applications like Firefox. Don’t believe me? How come you can have a server run for over a year and it responds just as well after the year as it did when you booted it? Imagine running Firefox for a year without restarting it? Can you even imagine that? The Linux kernel doesn’t seem to be the only “non-leaky” (there are leaks, but they are very rare, and probably mostly in the ugliest parts of the kernel — device drivers), Apache performs quite well even after running for a while, PostgreSQL, and the list goes on and on.

Why is it that Firefox and other projects seem to have so many problems? The only thing I can think of is the quality control that goes into checking new code before it’s committed. In the kernel community, a patch may get rewritten a dozen times, submitted to mailing lists for review, get comments from people familiar with the subsystem, but also from other developers (and budding developers trying to understand the existing code). It takes a lot of effort to get a piece of code into the kernel, but in the end, that code is well written, well reviewed, and it should benefit most users. Do the Firefox, et. al., communities do this? I do not know, but somehow, I suspect that it isn’t the case.

Dumping & restoring XFS volumes

Over the past few years, I’ve been using XFS wherever I could. I never really tried to tweak the mkfs options, and therefore most of my filesystems were quite sub-optimal. I managed to get my hands on an external 500GB disk that I decided to use for all this data shuffling…

320GB external firewire disk

This was probably the most offenseively made fs. Here’s the old info:

meta-data=/dev/sdb1      isize=512    agcount=17, agsize=4724999 blks
         =               sectsz=512   attr=1
data     =               bsize=4096   blocks=78142042, imaxpct=25
         =               sunit=0      swidth=0 blks, unwritten=1
naming   =version 2      bsize=4096  
log      =internal       bsize=4096   blocks=32768, version=2
         =               sectsz=512   sunit=0 blks, lazy-count=0
realtime =none           extsz=65536  blocks=0, rtextents=0

It had 512 byte inodes (instead of the more sane, and default 256 byte inodes) because I was playing around with SELinux when I made this filesystem, and the bigger inodes allow more extended attributes to be stored there — improving performance a whole lot. When I first made the fs, it had 16 allocation groups, but I grew the filesystem about 10GB which were used by a FAT32 partition that I used for Windows ↔ Linux data shuffling. On a simple disk (e.g., not a RAID 5), 4 allocation groups is far more logical then the 17 I had before. Another thing I wanted to use is the lazy-count. That got introduced in 2.6.23, and improved performance when multiple processes were filesystem metadata (create/unlink/mkdir/rmdir). And last, but not least, I wanted to use version 2 inodes.

The simples way to change all the filesystem to use these features is to backup, mkfs, and restore…and that’s what I did.

This is what the fs is like after the whole process (note that isize, agcount, attr, and lazy-count changed):

meta-data=/dev/sdb1      isize=256    agcount=4, agsize=19535511 blks
         =               sectsz=512   attr=2
data     =               bsize=4096   blocks=78142042, imaxpct=25
         =               sunit=0      swidth=0 blks
naming   =version 2      bsize=4096
log      =internal       bsize=4096   blocks=32768, version=2
         =               sectsz=512   sunit=0 blks, lazy-count=1
realtime =none           extsz=4096   blocks=0, rtextents=0

dumping…

I mkfs.xfs’d the 500GB disk, and mounted it on /mnt/dump. Since I like tinkering with storage, I couldn’t help but start blktrace for both of the disks (the one being dumped, and the one storing the dump).

Instead of using rsync, tar, or dd, I went with xfsdump/xfsrestore combo. xfsdump is a lot like tar — it creates a single with with all the data, but unlike tar, it also saves extended attributes, and preserves the hole information for sparse files. So, with blktrace running, it was time to start the dump:

# xfsdump -f /mnt/dump/acomdata_xfs.dump -p 60 -J /mnt/acomdata

The dump took about 9300 seconds (2 hours, 35 mins). Here are the graphs created by seekwatcher (which uses the blktrace traces)…The source disk is the firewire disk being dumped, and the target disk is the one being dumped to.

source disk

The IO here makes sense, xfsdump scans the entire filesystem — and backs up every inode sorted by the inode number (which is a function of the block number). The scattered accesses are because of fragmented files having data all over the place.

target disk

I’m not quite sure why XFS decided to break the dump file into 8 extents. These extents show up nicely as the 8 ascending lines. The horizontal line ~250GB is the journal being written to. (The seeks/second graph’s y-axis shows that seekwatcher has a bug when there’s very little seeking :) )

…and restoring

After the dump finished, I unmounted the 320GB fs, and ran mkfs on it (lazy-count=1, agcount=4, etc.). Then it was time to mount, start a new blktrace run on the 2 disks, and run xfsrestore — to extract all the files from the dump.

# xfsrestore -f /mnt/dump/acomdata_xfs.dump -p 60 -A -B -J /mnt/acomdata

I used the -A option to NOT restore xattrs as the only xattrs that were on the filesystem were some stray SELinux labels that managed to survive.

The restore took a bit longer…12000 seconds (3 hours, 20 minutes). And here are the traces for the restore:

source disk

Reading the 240GB file that was in 8 extents created a IO trace that’s pretty self explanatory. The constant writing to the journal was probably because of the inode access time updates. (And again, seekwatcher managed to round the seeks/second y-axis labels.)

target disk

This looks messy, but it actually isn’t bad at all. The 4 horizontal lines that look a lot like journal writes are probably the superblocks being updated to reflect the inode counts (4 allocation groups == 4 sets superblock + ag structures).

some analysis…

After the restore, I ran some debug tools to see how clean the filesystem ended up being…

…fragmentation

37945 extents used, ideal 37298 == not bad at all

…free space fragmentation

   from      to extents  blocks    pct
      1       1      19      19   0.00
      2       3       1       3   0.00
     64     127       2     150   0.00
    128     255       1     134   0.00
    512    1023       1     584   0.00
   4096    8191       1    4682   0.02
  32768   65535       1   36662   0.19
 131072  262143       1  224301   1.16
 262144  524287       3 1315076   6.79
 524288 1048575       2 1469184   7.59
1048576 2097151       4 6524753  33.71
2097152 4194303       4 9780810  50.53

== pretty much sqeaky clean

…per allocation group block usage

/dev/sdb1:
AG     1K-blocks         Used    Available    Use%
  0     78142044     40118136     38023908     51%
  1     78142044     78142040            4     99%
  2     78142044     42565780     35576264     54%
  3     78142036     74316844      3825192     95%
ALL    312568168    235142800     77425368     75%

I’m somewhat surprised that the 2nd and 4th are near full (well, 2nd ag has only 4kB free!), while the 1st and 3rd are only half full. As you can see, the 320GB disk is 75% used.

Bonus features

I decided to render mpeg versions of the IO traces…

source disk (dump) (4MB)
target disk (dump) (2MB)
source disk (restore) (2MB)
target disk (restore) (4.1MB) ← this is the best one of the bunch

Firefox + vim = ?

What would happen if someone took Firefox and combined it with the awesomeness of vim’s UI? Wouldn’t that be slick? Well, wait no longer! Someone has done precisely that!

Vimperator!

By default, all you see is just the tab list, and the status line on the bottom of the window. I’ve enabled the menu bars because I’m not ready to make a complete switch ;)

Of course, since it’s still Firefox, everything renders just the same.

Edit: It’s called Vimperator and it is Firefox extension.

lguest: The New Kid on the Block

As most of you know, virtuallization doesn’t really interest me, so me writing about lguest is rather unusual. For those who don’t know, lguest is Rusty Russell’s way of saying virtualization sucks and I can make it better (don’t quote me on that).

Yesterday, Rusty sent out 7 patch series ( 1, 2, 3, 4, 5, 6, 7) that contains most of the documentation for lguest. This is not the normal style of documentation you’ll find in the kernel. Here’s Rusty’s description…

Lguest is an adventure, with you, the reader, as Hero. I can’t think of many 5000-line projects which offer both such capability and glimpses of future potential; it is an exciting time to be delving into the source!

But be warned; this is an arduous journey of several hours or more! And as we know, all true Heroes are driven by a Noble Goal. Thus I offer a Beer (or equivalent) to anyone I meet who has completed this documentation.

So get comfortable and keep your wits about you (both quick and humorous). Along your way to the Noble Goal, you will also gain masterly insight into lguest, and hypervisors and x86 virtualization in general.

There is a very large number of totally hillarious comments. It looks like one doesn’t have to be an x86 expert to get a laugh out of them, but knowing a thing or two about the architecture makes it all the more enjoyable.

I can’t help but include few excerpts here…

Intel provided a special instruction to clear the TS bit for people too cool to use write_cr0() to do it. This "clts" instruction is faster, because all the vowels have been optimized out.

I’m told there are only two stories in the world worth telling: love and hate. So there used to be a love scene here like this:

Launcher: We could make beautiful I/O together, you and I.
Guest: My, that’s a big disk!

Unfortunately, it was just too raunchy for our otherwise-gentle tale.

Just read the patches. They are really amusing :)

Firefox

You may have heard me say that Firefox sucks. Well it does, but today, I accidentally found something that makes it suck less than what I thought.

Go to some page with text, and press the slash key and type something! Very nice. I like my applications to understand slash as "search" (vim, less, etc.). Now, if only it did regular expressions ;)

Guilt - Two Months Later

Quite a bit has changed in Guilt over the past two months. There have been 5 releases, as well as one release candidate (v0.19-rc1). During these releases, 71 files were changed (2210 insertions, 293 deletions) in 98 commits. There has been a number of patches I got via email:

Brandon Philips: 4
Nur Hussein: 3
Theodore Ts’o: 2
Yasushi SHOJI: 7

Brandon Philips contributed a significant portion of the documentation — currently every command has a man page! Additionally, he is trying to get Guilt picked up by Debian. So if you happen to qualify as a sponsor, do the right thing ;)

Nur Hussein contributed an uninstall script as well as fixed the patchbomb script.

As far as I know, Ted Ts’o is using Guilt to maintain his ext4 tree.

Yasushi contributed created a few patches which add several bits of quilt functionality (e.g., guilt-files, and guilt-series -v).

I have also been contacted by Peter Williams, the maintainer of gquilt (a GUI wrapper for quilt and mq, with a very similar name to guilt), if I would like to have guilt added as a backend. I checked out gquilt, and it looks really nice. I also looked at what it would take to add the backend, and it doesn’t look like that much effort, but I need to implement few bits of functionality first. If anyone wants to volunteer, let me or Peter know. :)

Git Quilt or Guilt for short

Here’s another update on my version control system escapades (a follow up to Do I have…).

As several people mentioned during the 0.10 release of gq, the name is already in use by a rather well established project. So, after some idleing and hacking, I decided that it was time to give the scripts a new name, and announce the new version on the git and linux-kernel mailing lists (the annoucement). I can’t take credit for the rather clever name, I asked a few people, and the best suggestion was by Dave - Git Quilt or Guilt for short.

One thing I did not expect was the fact that someone would contribute 2 patches very shortly after I announced it. Here’s the list of changes that made between v0.16 and v0.17:

Horst H. von Brand (2):
      Fix up Makefiles
      Run regression on the current version

Josef 'Jeff' Sipek (24):
      A minimalistic makefile
      Contributing doc file
      Added guilt-add
      Added guilt-status
      Expanded the HOWTO
      Added usage strings to all commands
      All arguments to guilt-add are filenames
      More thorough argument checking & display usage string on failure
      Changed status file format to include the hash of the commit
      Fixed guilt-refresh doing an unnecessary and somewhat wrong pop&push
      Fixed up guilt-{delete,pop} not matching the patch name properly
      Fixed guilt-{delete,pop} regexps some more
      Force UTC as timezone for regression tests
      Fixed a bug in guilt-pop introduced by the status file format switch
      Error messages should go to stderr
      Merge branch 'usage'
      Merge branch 'status-file'
      Yet another TODO update
      Added guilt-rm
      Makefile update & cleanup
      pop: Display the name of the patch from the status file, not the series file
      new: Create dir structure for the patch if necessary
      Documentation/TODO: Mark guilt-rm as done
      Guilt v0.17

I haven’t had much time to work on Guilt since then, but I got an rather encouriging email from someone, who tried to apply Andrew Morton’s -mm patch series on top of the kernel tree, but failed. The problem is with the way git-apply works. If it applies a patch with an offset, it still returns non-zero status. This makes guilt think that at least one of the hunks in the patch did not apply at all. As far as I know, there is no way to get the necessary information out of git-apply without either modifying it (which I might as well), or parsing the output for signs of rejection and ignoring the return status completely. I don’t like the latter, but changing git-apply would limit the number of compatible git versions. :-/

Needless to say, patches are welcomed :)

Step 1: Fame

Totally awesome day! I submitted Unionfs to the usual places (linux-kernel, fsdevel, and the key people), then I stayed up all night. In the morning, I got a form for permission to enroll in the graduate version of compilers (I’d much prefer lex & yacc to some made up java thing the undergrad course uses). At around 10, I decided to head home and get some sleep. I woke up about 8 hours later, and checked my email. I replied to a lot of comments/questions by Andrew Morton and some other people, and when I finally managed to check the rest of the inbox, I saw:

Jan 08 akpm@osdl.org   ( 236) + unionfs-documentation.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 107) + lookup_one_len_nd-lookup_one_len-with-nameidata-argument.patch added to -mm
Jan 08 akpm@osdl.org   ( 138) + unionfs-branch-management-functionality.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 649) + unionfs-common-file-operations.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 733) + unionfs-copyup-functionality.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 299) + unionfs-dentry-operations.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 313) + unionfs-file-operations.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 319) + unionfs-directory-file-operations.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 326) + unionfs-directory-manipulation-helper-functions.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 995) + unionfs-inode-operations.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 572) + unionfs-lookup-helper-functions.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 743) + unionfs-main-module-functions.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 344) + unionfs-readdir-state.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 501) + unionfs-rename.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 263) + unionfs-privileged-operations-workqueue.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 168) + unionfs-handling-of-stale-inodes.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 228) + unionfs-miscellaneous-helper-functions.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 402) + unionfs-superblock-operations.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 233) + unionfs-helper-macros-inlines.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 552) + unionfs-internal-include-file.patch added to -mm tree
Jan 08 akpm@osdl.org   (  87) + unionfs-include-file.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 218) + unionfs-unlink.patch added to -mm tree
Jan 08 akpm@osdl.org   ( 109) + unionfs-kconfig-and-makefile.patch added to -mm tree

Unionfs is now in -mm!

If you actually look at the next -mm changelog, you only see one patch containing all of Unionfs, as Andrew decided to use the git tree that I set up (gitweb) on kernel.org.

Do I have a thing for Version Control Systems?

So, for whatever reason, I seem to be working on version control systems far too much. I have a decent amount of code in Mercurial, I wrote a bunch of wrappers for CVS, I call them CDS which stands for Completely Dumb System which is an apt description of CVS. And now I am working on gq (git repo: git://git.kernel.org/pub/scm/linux/kernel/git/jsipek/gq.git) which is a porcelain (set of wrapper scripts for git) that gives a Mercurial Queues-like functionality to git users.

Yep, I think it is official, I have a thing for version control systems. Ever since I became very interested in them (~April 2005), I learned a lot about them, and I am kind of tempted to give it a go and try something of my own. :)

Ping, Pong

Someone I know sent me this link. It is an interesting program that appears to make tunneling traffic though ICMP Echo Requst and Response packets easy. I have heard of the idea before, and I was thinking that one day if I got bored, I’d do it. Well, I guess I don’t have to :)

Powered by blahgd