From: Stuart M. <Stu...@st...> - 2000-06-21 23:12:35
|
Folks I've been battling against a problem for a couple of days now, which looks like a 'generic' kernel problem, but it like to run it past you first. Note this is on an SH4, so to some extent we have to regard the cache as virtual. As far as I can tell, the exact problem I have is this - a page is allocated and mapped into a process's address space (I think its part of the stack) and is written to. The written data ends up in the cache as normal. - the process then exits, and all the pages are freed. - later another process executes a fork, and so the kernel needs to allocate a page for the child process's pgd, and picks the page which was previously part of the first processes stack - the pgd for the child process is copied from the parent to the child into the newly allocated page - as part of duplicating the memory map for the child a flush_cache_mm is called, which results in the entire cache being flushed, including the data which was cached right at the beginning of this description. This overwrites the pgd with garbage, and later causes errors to be reported. Note that the page is effectively mapped at two virtual addresses in this sequence, once in user space and once in kernel space. By chance these addresses are such that a cache synonym occurs (ie a given physical address maps to different cache lines). Now, as far as I can see this is a generic problem. When a process terminates, the data in the cache is not invalidated. For any system with a virtual cache, there is a chance that the page will be reused before the cache is flushed. Looking at the code, this appears to only be a problem when a process exits. Every other place I've looked, when a page is unmapped from a user address space (or even kernel address through vmalloc), that address range is first flushed. So, I've put together this small patch which solves the specific problem I am seeing (there is a lot of context so you can see what is happening, but only two changes). I'm not even sure the flush_tlb is needed, certainly not on an SH, and probably not ever, but it doesn't do any harm (except to performance!). Before I make a fool of myself on the kernel mailing list I'd appreciate any comments you can give. Stuart Index: mm/mmap.c =================================================================== RCS file: /cvsroot/linuxsh/kernel/mm/mmap.c,v retrieving revision 1.1.1.3 diff -U30 -r1.1.1.3 mmap.c --- mm/mmap.c 2000/04/29 15:46:04 1.1.1.3 +++ mm/mmap.c 2000/06/21 22:45:50 @@ -832,61 +832,63 @@ avl_insert(vma, &mm->mmap_avl); } /* Release all mmaps. */ void exit_mmap(struct mm_struct * mm) { struct vm_area_struct * mpnt; release_segments(mm); mpnt = mm->mmap; vmlist_modify_lock(mm); mm->mmap = mm->mmap_avl = mm->mmap_cache = NULL; vmlist_modify_unlock(mm); mm->rss = 0; mm->total_vm = 0; mm->locked_vm = 0; while (mpnt) { struct vm_area_struct * next = mpnt->vm_next; unsigned long start = mpnt->vm_start; unsigned long end = mpnt->vm_end; unsigned long size = end - start; if (mpnt->vm_ops) { if (mpnt->vm_ops->unmap) mpnt->vm_ops->unmap(mpnt, start, size); if (mpnt->vm_ops->close) mpnt->vm_ops->close(mpnt); } mm->map_count--; remove_shared_vm_struct(mpnt); + flush_cache_range(mm, start, end); zap_page_range(mm, start, size); + flush_tlb_range(mm, start, end); if (mpnt->vm_file) fput(mpnt->vm_file); kmem_cache_free(vm_area_cachep, mpnt); mpnt = next; } /* This is just debugging */ if (mm->map_count) printk("exit_mmap: map count is %d\n", mm->map_count); clear_page_tables(mm, FIRST_USER_PGD_NR, USER_PTRS_PER_PGD); } |