home of the madduck/ blog/
Using LVM snapshots for filesystem recovery

This morning, I found myself in Gunnar's position and was looking at an empty /home directory because of the following mount failure:

piper:~# mount /dev/mapper/vg0-home
mount: Unknown error 990
Starting XFS recovery on filesystem: dm-0 (logdev: internal)
Filesystem "dm-0": xfs_inode_recover: Bad inode magic number, dino ptr =
  0xffff810028a6bd00, dino bp = 0xffff81002851d140, ino = 33629
Filesystem "dm-0": XFS internal error xlog_recover_do_inode_trans(1) at
  line 2352 of file fs/xfs/xfs_log_recover.c.  Caller 0xffffffff88307729

I am used to XFS giving me hard times, but this time at least I didn't have much to worry as it happened at home, I have plenty other machines I can use instead, and all data was backed up.

Since this is on a Debian unstable machine, I had the latest xfs_repair available, and told it to do its thing, but it refused:

piper:~# xfs_repair /dev/mapper/vg0-home
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

Great. Catch-22.

I decided to have a go at fixing it, but as much as I pretended to grok xfs_db, I did not manage to get the filesystem into a consistent state. I thus fired off an email to the mailing list (which has not arrived there 4 hours later) and started pv to dump the raw data to another machine for later use or debugging.

Watching the progress meter of pv from time to time, it occured to me that my /home is in fact on LVM. I put it on LVM way back when LVM was new and cool and I wanted to gather experience with it. I've never actually done anything with it, but today it saved my little bits of remaining sanity.

Why? Because LVM can take snapshots of volumes. It took me a while to put the pieces together, but in the end, a snapshot would enable me at least to run a test-round of xfs_repair -L and assess the damage, and if it wasn't all that bad, I could just run the repair process on the actual volume again.

With the encouragement and knowledge of Christian Aichinger and Daniel J. Priem on IRC, I hence proceeded to give it a try. First I had to extend the volume group to make room for the snapshot data, and since I had no disks handy, I just used a loop device based on a sparse file:

dd if=/dev/zero of=/srv/loop bs=1 count=1 seek=5368709120
losetup -f /srv/loop
pvcreate /dev/loop0
vgextend vg0 /dev/loop0
lvcreate -s -n home-snapshot /dev/vg0/home -L 5G

(Thanks to Ian Nicholls for spotting the error: I was using pvcreate instead of that last lvcreate previously).

Now /dev/vg0/home-snapshot was a copy-on-write snapshot of the original volume, and I could do whatever I wanted with it, as long as I did not exceed the 5G size allotted for the copied data blocks that I modified. Using mount and xfs_repair, I could verify that the problem also shows up with the snapshot.

Thus, I ran xfs_repair -L on the snapshot and finally was able to mount the snapshot to inspect the /lost+found directory. Fortunately, it wasn't all that full, nor was there anything important in there, so I sighed, removed the snapshot (lvremove) and unleashed xfs_repair -L on the original volume, which is now again mounted under /home.

Finally, remove the loop device from the volume group (vgreduce). If you are like me and forgot to do so before rebooting and now find your LVM volumes not starting anymore, look into the --removemissing option.

I like this method. I knew the filesystem was corrupted, but I did not know to what extent. Had corruption been worse, I could have waited for the XFS guys to help me out. But seeing the low damage, I was able to decide to just purge the journal and accept the damage.

If this had happened on a more important machine (i.e. without recent backups), I would have proceeded a bit differently: once the snapshot was mounted, I'd have run xfs_dump on it to save the data. I would also have used more space for the snapshot and not removed it until the parent volume was repaired and mounted. It's unlikely that xfs_repair will produce different results when run on a volume and its snapshot, but better be safe than sorry.

All this said, I just want to point out that the above method is filesystem independent and hence would work for any filesystem out there. I've never been a big fan of LVM, especially without RAID to build on (striping across physical media multiplies your chance of data loss), but this method is a good reason for using LVM regardless: use a volume group instead of your physical disk, and logical volumes instead of partitions, but don't span disks (unless you have to). Then, if stuff goes bad, you can hook in another disk, extend the volume group with it, and work on snapshots.