I’m the fool. I took the plunge this week and installed a fresh clean copy of Fedora 11, but I made my root partition btrfs instead of something more… well, for lack of better words, stable and mature. This should serve as a little documentation of my findings, in the hopes that others don’t make the hasty call to dedicate their root partition to the new filesystem.
From the dawn of my days with Linux, I have been told to heed warnings that this new system gives. This isn’t Windows any more, the warnings that you receive are no longer benign. My first experience with this was when I had a hard drive that was dying, so I decided to do a fsck. The only problem was that I did a fsck on the root partition – while it was still mounted — read/write. In all fairness, it was my own arrogance that made me add the flag to manually override that protection system. I figured, “what could get worse? The drive is already dying”. I did this with no backups. Three days and 20-something manual superblock hex-edits later, I finally got my data back. The valuable thing from that incident is the experience, though. I now knew how to do manual superblock recreations, which always comes in super handy. I also had learned to heed the warnings of the core developers very well.
Until Saturday. My friend had told me before about how funny the new anaconda argument is: “icantbelieveitsnotbtr”. And now who says these hardcore developers don’t have a sense of humor? Well, I’ve learned a heck of a lot over these last few years, and sometimes I even feel guilty that I don’t give back as much as I should to the open source community. So I figured I’d give their hard work a shot. I’ve heard nothing but glowing reviews about ZFS, and btrfs is supposed to be the up-and-coming big thing in the filesystem world. I figured, at the very least, I could do some benchmarks or something.
So here goes nothing…
I fired up the installer, adding the appropriate flags. I had /dev/sda1 already formatted 50 gig ntfs for Windows. This was to be a dual-booting system. I added a 512-meg /boot, sda2, and a 2-gig swap, sda3. sda4 was to be a pointer to an extended partition table, where sda5 resides, filling up all of the remaining space of 59 gigs. I picked my packages and started the installer. Uh oh, the format of sda5 as btrfs failed. Oh well. Reboot, repeat. Same results, but this time it just died with some non-helpful error code 1. Uhhh. So I tried to boot again. My partitioning table was simply gone. WTF.
I remade the partition table, and figured, “third time’s the charm!”. I selected btrfs as the filesystem for /. If this didn’t work, I was going to give up and use the (now-default) ext4 filesystem. Alas, it worked. I was actually looking forward to using the new filesystem. The install finished normally, and I booted into my new Linux system.
First thing’s first, I ran a full yum upgrade. I noticed something just then. The seek times with btrfs were amazing, read speeds were okay, but the write speeds were flat-out terrible. I’m guessing that’s simply because of all of the metadata btrfs stores, but I honestly have no idea why that is. Oh well, I do far more reads than writes, so this works out well. I then proceeded to use the system for a while. After my used disk space started to get above 25%, I started to get really unexpected results – kernel failures, corrupted filesystem blocks, half-missing extent entries, system bus errors, the whole nine yards.
I did the next logical thing, I strace’d the processes that were dying unexpectedly on me. Worst of all was yum, it would just die with a “Bus Error”. Strace revealed it was hitting massive ENOSPC (no space left) errors trying to open files on the hard drive. df reported the filesystem as exactly 27% full. A google search revealed this as a known bug, but no one else has really had the problem until their filesystem reached over 75% full. So why was mine being the opposite? I theorized that it may have been because of loads and loads of useless metadata across the hard drive. I rebooted (the only thing that fixed the ENOSPC errors) and installed the btrfs-progs package. This package includes userspace utilities needed to create, check, modify and correct any inconsistencies in the btrfs filesystem. I ran a btrfsck, which reported no problems. I read that it may be useful to re-examine my metadata by forcing a balance of it across all drives, even though it only existed on one partition. This is done with the command “btrfs-vol -b /”. Kicker – this operation, after about 5 minutes of heavily accessing the drive, segfaulted. I was left with a half-balanced btrfs system, ENOSPC errors everywhere, and a feeling like this was the beginning of the end.
I didn’t quite want to give up yet, but I also wanted a usable machine. I had read somewhere that kernel 2.6.30 had an insane number of btrfs improvements. On a personal wish, I downloaded the kernel source for 2.6.31 and compiled. Booting into the new kernel was like a breath of fresh air.
SIDE NOTE: kernel > 2.6.30 is awesome. readahead is now merged into the kernel source for MUCH faster booting. There have been a ludicrous amount of improvements, check the changelog on kernel.org for a good list of these changes.
Anyway, the new kernel loaded up just fine. The errors held off for a bit, long enough to make me think they were gone. But then they came back full-force. The balance segfaulted again. The fsck said nothing was wrong. No one had any idea what was going on. I do like to help the open source community, but I simply don’t have time at the present to dig through hundreds of thousands of C code source lines in order to fix a bug this serious. After filing 4 bug reports, submitting probably close to 50 kernel errors, and genuinely giving it my best college try, I was done with btrfs. I rebooted into anaconda and formatted my system as ext4 and went along my way. Yes, I know you can *theoretically* convert a btrfs volume to ext4, but with the structure so far destroyed, I didn’t even bother wasting my time.
Moral? I guess just know what you’re getting into if you decide to give btrfs a try. When the kernel developers say that a feature is experimental, take that literally. There is no warning here, only some sound advice – proceed at your own risk, make plenty of backups, and just generally don’t expect too much of the filesystem at this early stage of it’s development.