Update: I rewrote the second sentence of the next paragraph (up until and including the footnote) to get things right. Thanks to Russell Cattelan for providing the info.
This must be the most misunderstood feature of
What happens is that
XFS logs all metadata changes to
the journal, except for the inode size, which gets flushed
to disk immediately for performance reasons [*]_. At this point,
the file will actually be a sparse file, which is nothing more than
a file whose metadata lists a file as being of a size different
than it currently is (I realise the "sparse" does not really apply
when the file is "overfull", i.e. when the metadata lists it as
smaller than it really is, but I am lacking a good word for that).
The disk extents get allocated only when the data actually hits the
XFS's famous delayed allocation
mechanism). If the power fails before the data was flushed to disk
and the journal entry cleared,
XFS will serve zeroes,
rather than the potentially random or sensitive data that is
actually on disk. This is a good thing.
..  sincealmost every*
the file size, it would be : a massive performance hit if every
size change was logged. However,
violates its own journaling rules by doing this.
You can run into more or less the same problem with any
journaling filesystem; the others just don't serve zeroes. Instead,
they give you the data that's physically on the medium. Imagine the
situation when the corrupt
/etc/motd suddenly becomes
a window to your previous
/etc/shadow contents... I
really prefer how
XFS handles that. Sometimes
you do get the old data back with the other filesystems, but this
is because the filesystems may reuse the blocks of the old file. So
it's a trade-off, and your choice between security and, uh,
The only way to protect against this is to use "physical-block
journaling" (as opposed to "logical journaling"), which is only
ext3 as far as I know (option
data=journal), at a massive performance loss. See
mailing list post by Theodore Ts'o for more info. Thanks to
Alceste Scalas for sharing it with me.
PS: Thanks to Wessel Dankers and Eric Sandeen for their help on
Update: Again thanks to Russell Cattelan, I just received word that code has been added to the 2.6.17 kernel which will flush an inode at close if it was truncated at some previous point. This should take care of many of the zeroed-file issues.
Update: RazorFS also supports
data=journal. Thanks to Hans Kratz for pointing this
out. He also adds some more information, parts of which I've written
One of the major problems of all journalled filesystems especially on notebooks/desktop machines leading to filesystem corruption is the harddisk writeback cache (enabled by default). With the writeback cache some writes are not immediately written to disk Now if a crash happens some of the data may not have hit the disk. The journalling filesystems however need a way to rely that the certain writes have hit the disk to ensure filesystem consistency.
There are two solutons: Enable write barrier support in the filesystems (avalable for
ext3since around 2.6.8, mount option:
barrier=1, available for the other filesystems as well now probably) or disabling the write cache altogether with
XFS supports barriers as well, but unfortunately
not yet on non-physical media, like
ext3 seems to work okay on
Enable your barriers today!
Update: Martin Steigerwald pointed out that
ext3 can't do barriers on non-physical devices
root@shambala:~ -> mount -o barrier=1 /dev/shambala/ext3 /mnt/zeit root@shambala:~ -> touch /mnt/zeit/barriertestfile root@shambala:~ -> umount /mnt/zeit Feb 28 20:31:46 shambala kernel: kjournald starting. Commit interval 5 seconds Feb 28 20:31:46 shambala kernel: EXT3 FS on dm-0, internal journal Feb 28 20:31:46 shambala kernel: EXT3-fs: mounted filesystem with ordered data mode. Feb 28 20:32:02 shambala kernel: JBD: barrier-based sync failed on dm-0 - disabling barriers
The barriers are disabled right after the write access initiated
The device mapper code (which gets used for software RAID from kernel 2.6.23 onwards) makes that even explicit.