From: Konstantin K. <khl...@op...> - 2012-04-03 05:06:23
|
Matt Helsley wrote: > On Sat, Mar 31, 2012 at 01:29:29PM +0400, Konstantin Khlebnikov wrote: >> Currently the kernel sets mm->exe_file during sys_execve() and then tracks >> number of vmas with VM_EXECUTABLE flag in mm->num_exe_file_vmas, as soon as >> this counter drops to zero kernel resets mm->exe_file to NULL. Plus it resets >> mm->exe_file at last mmput() when mm->mm_users drops to zero. >> >> Vma with VM_EXECUTABLE flag appears after mapping file with flag MAP_EXECUTABLE, >> such vmas can appears only at sys_execve() or after vma splitting, because >> sys_mmap ignores this flag. Usually binfmt module sets mm->exe_file and mmaps >> some executable vmas with this file, they hold mm->exe_file while task is running. >> >> comment from v2.6.25-6245-g925d1c4 ("procfs task exe symlink"), >> where all this stuff was introduced: >> >>> The kernel implements readlink of /proc/pid/exe by getting the file from >>> the first executable VMA. Then the path to the file is reconstructed and >>> reported as the result. >>> >>> Because of the VMA walk the code is slightly different on nommu systems. >>> This patch avoids separate /proc/pid/exe code on nommu systems. Instead of >>> walking the VMAs to find the first executable file-backed VMA we store a >>> reference to the exec'd file in the mm_struct. >>> >>> That reference would prevent the filesystem holding the executable file >>> from being unmounted even after unmapping the VMAs. So we track the number >>> of VM_EXECUTABLE VMAs and drop the new reference when the last one is >>> unmapped. This avoids pinning the mounted filesystem. >> >> So, this logic is hooked into every file mmap/unmmap and vma split/merge just to >> fix some hypothetical pinning fs from umounting by mm which already unmapped all >> its executable files, but still alive. Does anyone know any real world example? >> mm can be borrowed by swapoff or some get_task_mm() user, but it's not a big problem. >> >> Thus, we can remove all this stuff together with VM_EXECUTABLE flag and >> keep mm->exe_file alive till final mmput(). >> >> After that we can access current->mm->exe_file without any locks >> (after checking current->mm and mm->exe_file for NULL) >> >> Some code around security and oprofile still uses VM_EXECUTABLE for retrieving >> task's executable file, after this patch they will use mm->exe_file directly. >> In tomoyo and audit mm is always current->mm, oprofile uses get_task_mm(). > > Perhaps I'm missing something but it seems like you ought to split > this into two patches. The first could fix up the cell, tile, etc. arch > code to use the exe_file reference rather than walk the VMAs. Then the > second patch could remove the unusual logic used to allow userspace to unpin > the mount and we could continue to discuss that separately. It would > also make the git log somewhat cleaner I think... Ok, I'll resend this patch as independent patch-set, anyway I need to return mm->mmap_sem locking back. |