Josef “Jeff” Sipek

September 13, 2009

Think!

Filed under: open-source programming programming/kernel rants sysadmin — JeffPC @ 22:02

Alright, it ain't rocket science. When you are trying to decide which filesystem to use, and you see a 7 year old article which talks about people having problems with the fs on Red Hat 7.x (running 2.4.18 kernels), are you going to assume that nothing changed? What if all the developers tell you that things changed? Are you still going to believe the slashdot article? Grrr... No one is forcing you to use this filesystem, so if you believe a 7-year old /. article, then go away and don't waste the developers' & others' time.

June 6, 2009

Extracting RPMs and DEBs

Filed under: sysadmin documentation — JeffPC @ 18:14

Every so often I needed to extract a .deb package manually. Usually I ended up installing Midnight Commander, and used it to copy the contents out. This time around, I did some search, and found a straight forward description how to do it for .debs and .rpms.

RPM

rpm2cpio mypackage.rpm | cpio -vid

DEB

ar vx mypackage.deb
tar -xzvf data.tar.gz

or

ar p mypackage.deb data.tar.gz | tar zx

May 29, 2009

QUERY_STRING & mod_rewrite

Filed under: sysadmin — JeffPC @ 23:03

A few months ago, I needed to make some mod_rewrite rules that did things to the QUERY_STRING. After a lot of searching and unsuccessful attempts, I found this document (local mirror). Some experimenting later, I had it all working nicely.

For example, I've got something like:

RewriteCond %{QUERY_STRING}    ^page=([0-9]{1,})$
RewriteRule ^/testsite/$       /testsite/page.cgi?seek=%1       [PT,L]

August 2, 2008

Wordpress sucks, archive.org rocks

Filed under: rants open-source sysadmin — JeffPC @ 16:32

This is a follow up post to Wordpress sucks from a week ago.

I decided to try to figure out some more category names, and then it hit me...my site is crawled from time to time by The Wayback Machine. So, I searched, and found a copy from November 2007. Not the latest, but quite new enough to have all the categories I had. A little while later, my blahg is back to its former glory. And there was much rejoicing.

July 26, 2008

Wordpress sucks

Filed under: rants sysadmin — JeffPC @ 19:17

As I mentioned in my previous post, I decided to upgrade my Wordpress install. Every single time I upgraded it, I have no problems whatsoever. So, this time around, I didn't make a backup of the DB tables. Well, that was quite stupid of me. I copy over the new files (2.6 release tarball), and run the upgrade script. Poop! I lost all the category labels and descriptions. Gah! Absolutely, not fun. I have 40 categories, and now they don't have any labels. Well, not anymore, I'm trying to figure out which category id was which category (sometimes not as easy as it should be). I got some, I'm not sure about some. If you had links to any categories, they'll still work. If you had links to any posts, they'll still work. Things will just look a bit disorganized if you look at the list of categories on the website, or if you look at which categories a post belongs to.

Please send any and all hate mail to the Wordpress developers for breaking an upgrade from one stable release to another.

April 19, 2008

mbox vs. maildir

Filed under: sysadmin — JeffPC @ 01:15

Over the past two weeks, I decided to try converting some of the mboxes I have for mailing lists to maildirs. Last time I tried to do it, I noticed an unacceptable delay when I started mutt. Not when I tried to load the largest maildir. That made me give up.

For fun (or was it profit? ;) ), I decided to try again. And again, I saw that delay. This time around, I ran strace on mutt, and I found out that my custom .muttrc was making mutt believe that every file in the maildir was an mbox. Fixing up my muttrc made the delay go away. Now, I have start up times that are the same as with purely mbox setup, but opening up these maildirs takes up WAY less time. I'm talking fraction of second instead of 5-10 seconds.

I'm considering converting all but the spam box to maildir. I really don't need thousands of extra inodes which I'll never use anyway, but at the same time, I don't mind having one gigantic file full of the spam messages (I keep them because I don't manually check that no good messages got misclassified by spamassassin). I am going to probably tell XFS to reserve some diskspace for it, to prevent lots of fragmentation from the constant open-write-close syscall cycle.

February 29, 2008

Dumping & restoring XFS volumes

Over the past few years, I've been using XFS wherever I could. I never really tried to tweak the mkfs options, and therefore most of my filesystems were quite sub-optimal. I managed to get my hands on an external 500GB disk that I decided to use for all this data shuffling...

320GB external firewire disk


This was probably the most offenseively made fs. Here's the old info:


meta-data=/dev/sdb1 isize=512 agcount=17, agsize=4724999 blks
= sectsz=512 attr=1
data = bsize=4096 blocks=78142042, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=2
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=65536 blocks=0, rtextents=0

It had 512 byte inodes (instead of the more sane, and default 256 byte inodes) because I was playing around with SELinux when I made this filesystem, and the bigger inodes allow more extended attributes to be stored there - improving performance a whole lot. When I first made the fs, it had 16 allocation groups, but I grew the filesystem about 10GB which were used by a FAT32 partition that I used for Windows< ->Linux data shuffling. On a simple disk (e.g., not a RAID 5), 4 allocation groups is far more logical then the 17 I had before. Another thing I wanted to use is the lazy-count. That got introduced in 2.6.23, and improved performance when multiple processes were filesystem metadata (create/unlink/mkdir/rmdir). And last, but not least, I wanted to use version 2 inodes.

The simples way to change all the filesystem to use these features is to backup, mkfs, and restore...and that's what I did.

This is what the fs is like after the whole process (I bolded all the changes):


meta-data=/dev/sdb1 isize=256 agcount=4, agsize=19535511 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=78142042, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0

dumping...

I mkfs.xfs'd the 500GB disk, and mounted it on /mnt/dump. Since I like tinkering with storage, I couldn't help but start blktrace for both of the disks (the one being dumped, and the one storing the dump).

Instead of using rsync, tar, or dd, I went with xfsdump/xfsrestore combo. xfsdump is a lot like tar - it creates a single with with all the data, but unlike tar, it also saves extended attributes, and preserves the hole information for sparse files. So, with blktrace running, it was time to start the dump:


# xfsdump -f /mnt/dump/acomdata_xfs.dump -p 60 -J /mnt/acomdata

The dump took about 9300 seconds (2 hours, 35 mins). Here are the graphs created by seekwatcher (which uses the blktrace traces)...The source disk is the firewire disk being dumped, and the target disk is the one being dumped to.

source disk

The IO here makes sense, xfsdump scans the entire filesystem - and backs up every inode sorted by the inode number (which is a function of the block number). The scattered accesses are because of fragmented files having data all over the place.

target disk

I'm not quite sure why XFS decided to break the dump file into 8 extents. These extents show up nicely as the 8 ascending lines. The horizontal line ~250GB is the journal being written to. (The seeks/second graph's y-axis shows that seekwatcher has a bug when there's very little seeking :) )

...and restoring

After the dump finished, I unmounted the 320GB fs, and ran mkfs on it (lazy-count=1, agcount=4, etc.). Then it was time to mount, start a new blktrace run on the 2 disks, and run xfsrestore - to extract all the files from the dump.


# xfsrestore -f /mnt/dump/acomdata_xfs.dump -p 60 -A -B -J /mnt/acomdata

I used the -A option to NOT restore xattrs as the only xattrs that were on the filesystem were some stray SELinux labels that managed to survive.

The restore took a bit longer...12000 seconds (3 hours, 20 minutes). And here are the traces for the restore:

source disk

Reading the 240GB file that was in 8 extents created a IO trace that's pretty self explanatory. The constant writing to the journal was probably because of the inode access time updates. (And again, seekwatcher managed to round the seeks/second y-axis labels.)

target disk

This looks messy, but it actually isn't bad at all. The 4 horizontal lines that look a lot like journal writes are probably the superblocks being updated to reflect the inode counts (4 allocation groups == 4 sets superblock + ag structures).

some analysis...


After the restore, I ran some debug tools to see how clean the filesystem ended up being...

...fragmentation


37945 extents used, ideal 37298 == not bad at all

...free space fragmentation



from to extents blocks pct
1 1 19 19 0.00
2 3 1 3 0.00
64 127 2 150 0.00
128 255 1 134 0.00
512 1023 1 584 0.00
4096 8191 1 4682 0.02
32768 65535 1 36662 0.19
131072 262143 1 224301 1.16
262144 524287 3 1315076 6.79
524288 1048575 2 1469184 7.59
1048576 2097151 4 6524753 33.71
2097152 4194303 4 9780810 50.53

== pretty much sqeaky clean

...per allocation group block usage



/dev/sdb1:
AG 1K-blocks Used Available Use%
0 78142044 40118136 38023908 51%
1 78142044 78142040 4 99%
2 78142044 42565780 35576264 54%
3 78142036 74316844 3825192 95%
ALL 312568168 235142800 77425368 75%

I'm somewhat surprised that the 2nd and 4th are near full (well, 2nd ag has only 4kB free!), while the 1st and 3rd are only half full. As you can see, the 320GB disk is 75% used.

Bonus features


I decided to render mpeg versions of the IO traces...

source disk (dump) (4MB)

target disk (dump) (2MB)

source disk (restore) (2MB)

target disk (restore) (4.1MB) <- this is the best one of the bunch

Powered by a pile of c