[linuxsh-dev] Caches question

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Folks

I've been battling against a problem for a couple of days now, which looks
like a 'generic' kernel problem, but it like to run it past you first.

Note this is on an SH4, so to some extent we have to regard the cache
as virtual.

As far as I can tell, the exact problem I have is this
 - a page is allocated and mapped into a process's address space
   (I think its part of the stack) and is written to. The written
   data ends up in the cache as normal.
 - the process then exits, and all the pages are freed.
 - later another process executes a fork, and so the kernel needs to
   allocate a page for the child process's pgd, and picks
   the page which was previously part of the first processes stack
 - the pgd for the child process is copied from the parent to the child
   into the newly allocated page
 - as part of duplicating the memory map for the child a
   flush_cache_mm is called, which results in the entire cache being
   flushed, including the data which was cached right at the beginning
   of this description. This overwrites the pgd with garbage, and later
   causes errors to be reported.

Note that the page is effectively mapped at two virtual addresses in
this sequence, once in user space and once in kernel space. By chance
these addresses are such that a cache synonym occurs (ie a given physical
address maps to different cache lines).

Now, as far as I can see this is a generic problem. When a process
terminates, the data in the cache is not invalidated. For any system with
a virtual cache, there is a chance that the page will be reused before the
cache is flushed.

Looking at the code, this appears to only be a problem when a process exits.
Every other place I've looked, when a page is unmapped from a user address
space (or even kernel address through vmalloc), that address range is first
flushed.

So, I've put together this small patch which solves the specific problem
I am seeing (there is a lot of context so you can see what is happening,
but only two changes).  I'm not even sure the flush_tlb is needed,
certainly not on an SH, and probably not ever, but it doesn't do any harm
(except to performance!).

Before I make a fool of myself on the kernel mailing list I'd appreciate
any comments you can give.

Stuart

Index: mm/mmap.c
===================================================================
RCS file: /cvsroot/linuxsh/kernel/mm/mmap.c,v
retrieving revision 1.1.1.3
diff -U30 -r1.1.1.3 mmap.c

--- mm/mmap.c	2000/04/29 15:46:04	1.1.1.3
+++ mm/mmap.c	2000/06/21 22:45:50
@@ -832,61 +832,63 @@
 		avl_insert(vma, &mm->mmap_avl);
 }
 
 /* Release all mmaps. */
 void exit_mmap(struct mm_struct * mm)
 {
 	struct vm_area_struct * mpnt;
 
 	release_segments(mm);
 	mpnt = mm->mmap;
 	vmlist_modify_lock(mm);
 	mm->mmap = mm->mmap_avl = mm->mmap_cache = NULL;
 	vmlist_modify_unlock(mm);
 	mm->rss = 0;
 	mm->total_vm = 0;
 	mm->locked_vm = 0;
 	while (mpnt) {
 		struct vm_area_struct * next = mpnt->vm_next;
 		unsigned long start = mpnt->vm_start;
 		unsigned long end = mpnt->vm_end;
 		unsigned long size = end - start;
 
 		if (mpnt->vm_ops) {
 			if (mpnt->vm_ops->unmap)
 				mpnt->vm_ops->unmap(mpnt, start, size);
 			if (mpnt->vm_ops->close)
 				mpnt->vm_ops->close(mpnt);
 		}
 		mm->map_count--;
 		remove_shared_vm_struct(mpnt);
+		flush_cache_range(mm, start, end);
 		zap_page_range(mm, start, size);
+		flush_tlb_range(mm, start, end);
 		if (mpnt->vm_file)
 			fput(mpnt->vm_file);
 		kmem_cache_free(vm_area_cachep, mpnt);
 		mpnt = next;
 	}
 
 	/* This is just debugging */
 	if (mm->map_count)
 		printk("exit_mmap: map count is %d\n", mm->map_count);
 
 	clear_page_tables(mm, FIRST_USER_PGD_NR, USER_PTRS_PER_PGD);
 }