#250 jfs destroys data

bug
open
kernel (207)
8
2006-09-01
2006-09-01
Nico Schottelius
No

I experience some time the problem, that my system
suddenly freezes, because of kernel panics or because I
loose power (it's a notebook) (that's not
the problem I am reporting ;-)

Most times, some files are 'broken' after a fsck:

- .viminfo is an always canditate for it
- .git loosed some objects
- /etc/pango/pango.modules

I am now wondering

a) why there is sometimes garbage in my files

b) whether jfs should be able to NOT encounter such
situations

I thought, that if the systems crashes and some
applications have open buffers, that the original files
are stil there and that they are not broken?

I am currently putting all broken files to

http://home.schottelius.org/~nico/linux/fs/jfs/broken-files/b

Perhaps someone can use them to debug whats happening here.

And no, this is not a hardware problem, this also
happens on other systems.

I am using Linux 2.6.x, x >= 16

I would be happy for any feedback.

Sincerly

Nico

Discussion

    • milestone: --> bug
    • priority: 5 --> 8
     
  • David Kleikamp
    David Kleikamp
    2006-09-01

    • labels: --> kernel
    • assigned_to: nobody --> shaggyk
     
  • David Kleikamp
    David Kleikamp
    2006-09-01

    Logged In: YES
    user_id=422440

    Is there any metadata corruption? That is to say, does fsck
    report any problems?

    If the only corruption is the file data, this is probably
    just a limitation that jfs only ensures that the file system
    is kept consistent after a crash/power loss. It is similar
    to ext3's data=writeback mode (man 8 mount).

    For instance, when vim updates .viminfo, it creates a new
    file .viminfo.tmp, writes to it, does NOT fsync the data,
    and renames it over the old .viminfo. The metadata changes
    are all recorded in the journal, and may even be written to
    disk before the new file data makes it to disk. If the
    system crashes between the time the journal is written to
    disk and the actually data makes it to disk, you will see
    the wrong data in the file after rebooting.

    I'm going to consider supporting a data=ordered mode for
    jfs, that would ensure that the data is written before the
    metadata changes are committed to disk (maybe even make it
    the default). I can't promise how quickly I can get it done
    though.

     
  • Logged In: YES
    user_id=1588406

    fsck reported problems the last time, but normally it simply
    replays the journal.

    I think data=ordered could really be a good solution,
    currently I had only minor data loss, so not a big problem.

    Though, it's really annoying to know that there may be real
    data loss. What I am wondering is why /var/lib/dpkg/available
    is also a good candidate for getting screwed.

    So if I understood you correctly, if I have a program that
    copies file a (let's say a 100MiB big open office document)
    to $tmp location, changes it, and issues a move back to the
    original location, the move may be recorded, but the new
    data may be missing?

    So that means there is simply data litter in the file I
    worked on?

     
  • David Kleikamp
    David Kleikamp
    2006-09-01

    Logged In: YES
    user_id=422440

    Yes, that could happen. Some applications play it safe by
    calling fsync() against the new file before renaming it back
    to the original name. I'm kind of surprised that vim does not.

     
  • Logged In: YES
    user_id=1588406

    ok, thanks for the information. I would simply be happy, if
    you leave the bug open, until it's implemented in jfs, so I
    get notified as soon as this happens. If you need someone
    for testing, simply drop me a line via e-mail.

     
  • David Kleikamp
    David Kleikamp
    2006-09-01

    Logged In: YES
    user_id=422440

    Sure. I plan on keeping the bug open. Don't hesitate to nag
    me if I don't respond within a couple weeks. :-)

     
  • Solra Bizna
    Solra Bizna
    2008-02-12

    Logged In: YES
    user_id=810548
    Originator: NO

    As a developer, I always thought atomically replacing a file was as simple as open-write-close-rename. I just found out this wasn't the case when a router prototype I'm working on ate most of its own configuration on power loss. The language in question doesn't have fsync, and it's not a portable function besides; maybe there needs to be a change in the semantics of rename? (like automatically flushing the file being renamed, for instance?)
    -:sigma.SB

     
  • David Kleikamp
    David Kleikamp
    2008-02-12

    Logged In: YES
    user_id=422440
    Originator: NO

    fsync() isn't portable?

    Implementing data=ordered would make jfs behave, but if you're concerned about portability, the application will need to call fsync or something equivalent in order to guarantee the data is committed to storage.

     
  • Solra Bizna
    Solra Bizna
    2008-02-12

    Logged In: YES
    user_id=810548
    Originator: NO

    > fsync() isn't portable?
    It is only "portable" if you define "portable" as "present on most, if not all, POSIX-like operating systems." With the current state of things, it's not possible (for instance) to make a pure standard C program that atomically replaces data.

    Besides, it looks like I'm far from the only developer to make the assumption. It seems to me that a lot of applications are broken by it, too. Is there any downside to adding an implicit fsync to the rename operation? I haven't done any testing, but I'd think it'd be a better performance compromise than data=ordered, while maintaining data integrity in more cases than pure data=writeback. (I'm talking about adding this to the kernel as a file-system independent thing, not a JFS-specific hack; I'm aware this isn't the place to send this kind of thing to, I'm just wondering whether there's anything obviously wrong with the idea before I take it somewhere relevant.)
    -:sigma.SB

     
  • Logged In: NO

    I wouldn't be opposed to a change to rename() to sync the file's data, but I'm not sure what other opinions might be out there. It may be interesting to propose the idea to linux-fsdevel@vger.kernel.org.

     
  • Logged In: YES
    user_id=1588406
    Originator: YES

    hey guys..this is still open, when do you fix it?