|
From: Mimi Z. <zo...@li...> - 2016-02-18 20:25:49
|
Hi Ted, On Thu, 2016-02-18 at 15:41 +0100, Patrick Ohly wrote: > On Thu, 2016-02-18 at 07:32 -0500, Mimi Zohar wrote: > > On Mon, 2016-02-15 at 11:27 +0100, Patrick Ohly wrote: > > > On Tue, 2015-09-22 at 20:10 +0200, Patrick Ohly wrote: > > > > On Tue, 2015-09-22 at 09:04 -0400, Mimi Zohar wrote: > > > > > On Tue, 2015-09-22 at 14:58 +0200, Patrick Ohly wrote: > > > > > > Would it be possible to intercept the in-kernel implementation of > > > > > > fdatasync() and trigger a hash update *and* a flushing of the xattr? > > > > > > > > > > That sounds like a good compromise, but would it be enough to resolve > > > > > the problem? > > > > > > > > I suspect that there's still a time window where a file content change > > > > has hit the disk while the corresponding xattr change has not, for > > > > example between writes and the fdatasync(). It would be much better, but > > > > probably not good enough. > > > > > > > > Primarily I wanted to raise the problem here and get opinions. My > > > > conclusion is that writing databases probably should be done where it is > > > > not in the IMA policy, even if the risk was reduced by also updating the > > > > hash on fdatasync(). > > > > > > Let me come back to this. > > > > > > I noticed that even for simple files, like /etc/machine-id, it is very > > > easy to corrupt the system. /etc/machine-id contains a 32 byte unique > > > ID, written by systemd in machine_id_commit() [1]. That code does a > > > normal close(), i.e. no explicit fdatasync() and no fsync(). > > > > > > [1] https://github.com/systemd/systemd/blob/master/src/core/machine-id-setup.c > > > > > > When using on-device hashing and ext4, the new file and its data are > > > stored by ext4 right away (at least in the journal), but the > > > corresponding security.ima remains cached in memory for surprisingly > > > long periods of time (minutes or longer - haven't tried to determine > > > that more precisely). > > > > > > Power off during that time and the system becomes unusable due to the > > > "permission denied" on a core system file. > > > > > > Any recommendations for addressing the problem, not just > > > for /etc/machine-id, but also in general? > > > > The first question is whether the xattr not being saved/restored is > > limited to those stored in the extra block (i_file_acl) or in the inode > > as well. Defining EXT4_XATTR_DEBUG in fs/ext4/xattr.c will show where > > the xattrs are being stored. I assume they are being stored in the > > extra block. > > While testing this, I noticed that my earlier comment about xattr not > getting flushed is wrong: it is the other way around. File *content* > isn't getting flushed to disk, whereas the modified xattr is. > > That also explains why adding fsync() helps: it forces the content to > disk, so content and xattr match after a powerloss. > > But the underlying problem remains the same: with IMA, it is much more > important that meta data and file content remain in sync than it is > without, and that's currently not working well. > > > > For machine-id, I can add an explicit "sync" shell command invocation > > > after writing the file, but that's not a general solution. > > > > > > Implementing the enhanced fdatasync() mentioned before and relying on > > > programs to call fdatasync() would help somewhat, but not all programs > > > call it. Would calling fsync() in systemd have helped? > > > > > > ext4 mount options also don't look promising. commit=nrsec flushes data > > > after 5 seconds by default, but does not seem to include xattrs. > > > > > > Journaling is already using data=ordered, so meta data should be as safe > > > as it can be, and yet it still doesn't include the modified xattr. > > Given these settings, it is surprising that the data does not get > flushed to disk even after minutes. > > Could this be a result of running under qemu? I kill the qemu process > instead of properly shutting down the virtual machine. No, that's not it > either. Even when resetting with "system_reset", I get the same failure. > > "data=journal" helps. It avoids the problem completely and might be the > best solution despite the impact ("disables delayed allocation and > O_DIRECT"). > > If anyone has suggestions for debugging the long delay until data gets > flushed, then I am open for suggestions. I'm still seeing that even with > "data=journal", it's only that the effect is less drastic. Any suggestions on how to force the filedata to be flushed to disk as soon as the security xattrs are written. Thanks! Mimi |