From: Stuart M. <Stu...@st...> - 2000-06-21 23:12:35
|
Folks I've been battling against a problem for a couple of days now, which looks like a 'generic' kernel problem, but it like to run it past you first. Note this is on an SH4, so to some extent we have to regard the cache as virtual. As far as I can tell, the exact problem I have is this - a page is allocated and mapped into a process's address space (I think its part of the stack) and is written to. The written data ends up in the cache as normal. - the process then exits, and all the pages are freed. - later another process executes a fork, and so the kernel needs to allocate a page for the child process's pgd, and picks the page which was previously part of the first processes stack - the pgd for the child process is copied from the parent to the child into the newly allocated page - as part of duplicating the memory map for the child a flush_cache_mm is called, which results in the entire cache being flushed, including the data which was cached right at the beginning of this description. This overwrites the pgd with garbage, and later causes errors to be reported. Note that the page is effectively mapped at two virtual addresses in this sequence, once in user space and once in kernel space. By chance these addresses are such that a cache synonym occurs (ie a given physical address maps to different cache lines). Now, as far as I can see this is a generic problem. When a process terminates, the data in the cache is not invalidated. For any system with a virtual cache, there is a chance that the page will be reused before the cache is flushed. Looking at the code, this appears to only be a problem when a process exits. Every other place I've looked, when a page is unmapped from a user address space (or even kernel address through vmalloc), that address range is first flushed. So, I've put together this small patch which solves the specific problem I am seeing (there is a lot of context so you can see what is happening, but only two changes). I'm not even sure the flush_tlb is needed, certainly not on an SH, and probably not ever, but it doesn't do any harm (except to performance!). Before I make a fool of myself on the kernel mailing list I'd appreciate any comments you can give. Stuart Index: mm/mmap.c =================================================================== RCS file: /cvsroot/linuxsh/kernel/mm/mmap.c,v retrieving revision 1.1.1.3 diff -U30 -r1.1.1.3 mmap.c --- mm/mmap.c 2000/04/29 15:46:04 1.1.1.3 +++ mm/mmap.c 2000/06/21 22:45:50 @@ -832,61 +832,63 @@ avl_insert(vma, &mm->mmap_avl); } /* Release all mmaps. */ void exit_mmap(struct mm_struct * mm) { struct vm_area_struct * mpnt; release_segments(mm); mpnt = mm->mmap; vmlist_modify_lock(mm); mm->mmap = mm->mmap_avl = mm->mmap_cache = NULL; vmlist_modify_unlock(mm); mm->rss = 0; mm->total_vm = 0; mm->locked_vm = 0; while (mpnt) { struct vm_area_struct * next = mpnt->vm_next; unsigned long start = mpnt->vm_start; unsigned long end = mpnt->vm_end; unsigned long size = end - start; if (mpnt->vm_ops) { if (mpnt->vm_ops->unmap) mpnt->vm_ops->unmap(mpnt, start, size); if (mpnt->vm_ops->close) mpnt->vm_ops->close(mpnt); } mm->map_count--; remove_shared_vm_struct(mpnt); + flush_cache_range(mm, start, end); zap_page_range(mm, start, size); + flush_tlb_range(mm, start, end); if (mpnt->vm_file) fput(mpnt->vm_file); kmem_cache_free(vm_area_cachep, mpnt); mpnt = next; } /* This is just debugging */ if (mm->map_count) printk("exit_mmap: map count is %d\n", mm->map_count); clear_page_tables(mm, FIRST_USER_PGD_NR, USER_PTRS_PER_PGD); } |
From: NIIBE Y. <gn...@ch...> - 2000-06-23 02:29:38
|
Stuart Menefy wrote: > I've been battling against a problem for a couple of days now, which looks > like a 'generic' kernel problem, but it like to run it past you first. I know the battle is quite annoying... > Note this is on an SH4, so to some extent we have to regard the cache > as virtual. More specifically, SH-4 has virtually tagged and physically indexed cache system. I think that MIPS R4000 has same system. Generally speaking, I don't think we need flush at the time of EXIT. We can postpone the flush at the time of reuse of the physical memory (or if the memory is overwritten, we don't need to flush them). > As far as I can tell, the exact problem I have is this > - a page is allocated and mapped into a process's address space > (I think its part of the stack) and is written to. The written > data ends up in the cache as normal. > - the process then exits, and all the pages are freed. > - later another process executes a fork, and so the kernel needs to > allocate a page for the child process's pgd, and picks > the page which was previously part of the first processes stack > - the pgd for the child process is copied from the parent to the child > into the newly allocated page > - as part of duplicating the memory map for the child a > flush_cache_mm is called, which results in the entire cache being > flushed, including the data which was cached right at the beginning > of this description. This overwrites the pgd with garbage, and later > causes errors to be reported. I think that this (corrupt of the pgd) can't be happened in this sequence. The page allocated for PGD is initialized by get_pgd_slow (overwritten). If the corrupt of the memory occurs, the sequence would be like this: (1) There's the cache entry, say, for the virtual address VIRT, for physicall address PHYS. (2) (In some reason (bug)) the flush doesn't occur, where it should. (3) The page of PHYS are allocated to different purpose. (4) Some memory access occurs which invalidate VIRT. The entry is flushed and currupts the data at PHYS. Totally unrelated data is written back to physical memory. Your patch may change the behavior as it flushes the cache at the time of exit system call. However, I suspect there's real bug at other place. -- |
From: NIIBE Y. <gn...@ch...> - 2000-06-23 03:11:26
|
Sorry Stuart. Your analysis is right. I'm always wrong. :-) NIIBE Yutaka wrote: > Stuart Menefy wrote: > > I've been battling against a problem for a couple of days now, which looks > > like a 'generic' kernel problem, but it like to run it past you first. > > I know the battle is quite annoying... > > > Note this is on an SH4, so to some extent we have to regard the cache > > as virtual. > > More specifically, SH-4 has virtually tagged and physically indexed > cache system. I think that MIPS R4000 has same system. > > Generally speaking, I don't think we need flush at the time of EXIT. > We can postpone the flush at the time of reuse of the physical memory > (or if the memory is overwritten, we don't need to flush them). > > > As far as I can tell, the exact problem I have is this > > - a page is allocated and mapped into a process's address space > > (I think its part of the stack) and is written to. The written > > data ends up in the cache as normal. > > - the process then exits, and all the pages are freed. > > - later another process executes a fork, and so the kernel needs to > > allocate a page for the child process's pgd, and picks > > the page which was previously part of the first processes stack > > - the pgd for the child process is copied from the parent to the child > > into the newly allocated page > > - as part of duplicating the memory map for the child a > > flush_cache_mm is called, which results in the entire cache being > > flushed, including the data which was cached right at the beginning > > of this description. This overwrites the pgd with garbage, and later > > causes errors to be reported. > > I think that this (corrupt of the pgd) can't be happened in this > sequence. The page allocated for PGD is initialized by get_pgd_slow > (overwritten). Sorry, I misunderstood. The currupts will occur, when the page was mapped to the user space and there's cache alias. I checked the code, the page allocation code does not include any cache flush. Hence, your patch is the right solution. -- |
From: Stuart M. <Stu...@st...> - 2000-06-23 18:26:48
|
Niibe-san I got both your mails about the cache problem. Thanks for confirming my thoughts, and I'll post something on the kernel mailing list about this. However your first mail raised some interesting questions. > > Note this is on an SH4, so to some extent we have to regard the cache > > as virtual. > > More specifically, SH-4 has virtually tagged and physically indexed > cache system. I think that MIPS R4000 has same system. This was what I thought, until I started looking at it yesterday. You're right, it has the same synonym problem, due to cache size vs. page size issues. However there is one important difference, the hardware detects the synonym, and raises an exception when it would be created. This has advantages and disadvantages: - the advantage is that, although there is a performance penalty, there sould be no risk of errors caused by synonyms - however it does prevent legitimate synonyms, for example when all the mappings are read only. SH4 doesn't have this detection capability, so we have to make sure we catch any problems early. The other point at which synonyms can occur is when a page is explicitly mapped at multiple addresses in user space. So I had a go at writing a simple test case for this, and then fixing the problem. Attached is my test code, and a potential fix. This is taken almost unchanged from the Sparc sun4c code, which has a completely virtual cache. Its horrible, but I don't see any alternative, unless we can restrict the addresses at which shared pages can be mapped. This also got me thinking about another potential problem. In update_mmu_cache, the new TLB entry is copied into the TLB using the LDTLB instruction. This uses the URC bits of the MMUCR register to select the TLB entry to replace, so in effect the TLB replaced is selected at random. If however there is already a TLB entry for this page, then this could cause two TLB entries to refer to the same page, and cause a 'Multiple Hit Exception' when the page is next accessed. I guess the fact that we are not seeing these means that it is not a problem, but I've not convinced myself that it can never occur. If we have to, it would be pretty easy to fix, effectivly perform a flush_tlb_page before loading the new entry, but this messy. Any thoughts? > Generally speaking, I don't think we need flush at the time of EXIT. > We can postpone the flush at the time of reuse of the physical memory > (or if the memory is overwritten, we don't need to flush them). Agreed. However I came across an email from Linus (admitidly written way back in 1995) where he said that he preferred flushing when a page was freed, as this is likly to be less time critical than when it is allocated. This appears to be the way the kernel implements things. Stuart ------------------------------------------------------------------------------ #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <sys/mman.h> #include <stdio.h> #define SIZE (1024) main() { volatile int *p1, *p2; int fd; int i; int count; printf("Test running\n"); fd = open("/tmp/xx", O_RDWR | O_CREAT, 0666); if (fd < 0) { perror("Failed to open file"); exit(1); } for (i=0; i<SIZE; i++) { int dummy = 0; write(fd, &dummy, sizeof(int)); } p1 = mmap(NULL, SIZE * sizeof(int), PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); if (p1 == MAP_FAILED) { perror("First mmap failed"); exit(1); } p2 = mmap(p1 + SIZE, SIZE * sizeof(int), PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0); if (p2 == MAP_FAILED) { perror("Second mmap failed"); exit(1); } /* access the second mapping here, so that the TLB is set up, this * way we don't take a fault on the first access, which might cause * a flush. */ p2[0] = 1; printf("Running test\n"); for (i=0; i<SIZE; i++) { p1[i] = i; } count = 0; for (i=0; i<SIZE; i++) { int j; j = p2[i]; if (j != i) { printf("Diff at 0x%x (read 0x%x)\n", i, j); count++; } } printf("Differences: %d\n", count); printf("Finished\n"); } ------------------------------------------------------------------------------ Index: arch/sh/mm/fault.c =================================================================== RCS file: /cvsroot/linuxsh/kernel/arch/sh/mm/fault.c,v retrieving revision 1.11 diff -u -r1.11 fault.c --- arch/sh/mm/fault.c 2000/05/18 09:34:18 1.11 +++ arch/sh/mm/fault.c 2000/06/23 17:58:48 @@ -268,6 +268,82 @@ goto no_context; } +#ifdef __SH4__ +/* There are really two cases of aliases to watch out for, and these + * are: + * + * 1) A user's page which can be aliased with the kernels virtual + * mapping of the physical page. + * + * 2) Multiple user mappings of the same inode/anonymous object + * such that two copies of the same data for the same phys page + * can live (writable) in the cache at the same time. + * + * We handle number 1 by flushing the kernel copy of the page always + * after COW page operations. + */ +static void synonym_fixup(struct vm_area_struct *vma, unsigned long address, pte_t pte) +{ + pgd_t *pgdp; + pte_t *ptep; + + if (vma->vm_file) { + struct address_space *mapping; + unsigned long offset = (address & PAGE_MASK) - vma->vm_start; + struct vm_area_struct *vmaring; + int alias_found = 0; + + mapping = vma->vm_file->f_dentry->d_inode->i_mapping; + spin_lock(&mapping->i_shared_lock); + vmaring = mapping->i_mmap; + do { + unsigned long vaddr = vmaring->vm_start + offset; + unsigned long start; + + /* Do not mistake ourselves as another mapping. */ + if (vmaring == vma) + continue; + + printk("Found a shared mapping\n"); + if ((vaddr ^ address) & 0x3000) { + printk("Found an synonym problem\n"); + alias_found++; + start = vmaring->vm_start; + /* If this is already in the cache flush it, + * and remove any existing TLB entries. */ + while (start < vmaring->vm_end) { + pgdp = pgd_offset(vmaring->vm_mm, start); + if (!pgdp) + goto next; + ptep = pte_offset((pmd_t *) pgdp, start); + if (!ptep) + goto next; + + if (pte_val(*ptep) & _PAGE_PRESENT) { + printk("..and the page is present\n"); + flush_cache_page(vmaring, start); + *ptep = __pte(pte_val(*ptep) & + (~ _PAGE_CACHABLE)); + flush_tlb_page(vmaring, start); + } + next: + start += PAGE_SIZE; + } + } + } while ((vmaring = vmaring->vm_next_share) != NULL); + spin_unlock(&mapping->i_shared_lock); + + if (alias_found && (pte_val(pte) & _PAGE_CACHABLE)) { + printk("updating PTE\n"); + pgdp = pgd_offset(vma->vm_mm, address); + ptep = pte_offset((pmd_t *) pgdp, address); + *ptep = __pte(pte_val(*ptep) & (~ _PAGE_CACHABLE)); + pte = *ptep; + } + } +} +#endif + void update_mmu_cache(struct vm_area_struct * vma, unsigned long address, pte_t pte) { @@ -276,6 +352,11 @@ unsigned long pteaddr; save_and_cli(flags); + +#ifdef __SH4__ + if ((vma->vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)) + synonym_fixup(vma, address, pte); +#endif /* Set PTEH register */ if (vma) { |
From: NIIBE Y. <gn...@ch...> - 2000-06-24 02:00:20
|
Stuart Menefy wrote: > You're right, it has the same synonym problem, due to cache size vs. > page size issues. However there is one important difference, the > hardware detects the synonym, and raises an exception when it would be > created. Interesting point. Yes, we can find the similarity between MIPS and SuperH, but there're the differences we should watch out. > This has advantages and disadvantages: > - the advantage is that, although there is a performance penalty, > there sould be no risk of errors caused by synonyms > - however it does prevent legitimate synonyms, for example when all > the mappings are read only. > > SH4 doesn't have this detection capability, so we have to make sure > we catch any problems early. Yes. > The other point at which synonyms can occur is when a page is explicitly > mapped at multiple addresses in user space. So I had a go at writing a > simple test case for this, and then fixing the problem. Attached is my > test code, and a potential fix. This is taken almost unchanged from > the Sparc sun4c code, which has a completely virtual cache. Its horrible, > but I don't see any alternative, unless we can restrict the addresses > at which shared pages can be mapped. OK. The point is that: When there's shared mapping which causes the alias (synonym) we fix-up the pte(s) so that the data should not on the caches. Specifically, on the TLB fault of the page (of WRITE&&SHARED), we fix-up the all the pte as uncachablie page. > void update_mmu_cache(struct vm_area_struct * vma, > unsigned long address, pte_t pte) > { > @@ -276,6 +352,11 @@ > unsigned long pteaddr; > > save_and_cli(flags); > + > +#ifdef __SH4__ > + if ((vma->vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)) > + synonym_fixup(vma, address, pte); > +#endif > > /* Set PTEH register */ > if (vma) { This part, agreed. There's a point which I can't understand. > diff -u -r1.11 fault.c > --- arch/sh/mm/fault.c 2000/05/18 09:34:18 1.11 > +++ arch/sh/mm/fault.c 2000/06/23 17:58:48 > @@ -268,6 +268,82 @@ > goto no_context; > } > > +#ifdef __SH4__ > +/* There are really two cases of aliases to watch out for, and these > + * are: > + * > + * 1) A user's page which can be aliased with the kernels virtual > + * mapping of the physical page. > + * > + * 2) Multiple user mappings of the same inode/anonymous object > + * such that two copies of the same data for the same phys page > + * can live (writable) in the cache at the same time. > + * > + * We handle number 1 by flushing the kernel copy of the page always > + * after COW page operations. > + */ > +static void synonym_fixup(struct vm_area_struct *vma, unsigned long address, pte_t pte) > +{ > + pgd_t *pgdp; > + pte_t *ptep; > + > + if (vma->vm_file) { > + struct address_space *mapping; > + unsigned long offset = (address & PAGE_MASK) - vma->vm_start; > + struct vm_area_struct *vmaring; > + int alias_found = 0; > + > + mapping = vma->vm_file->f_dentry->d_inode->i_mapping; > + spin_lock(&mapping->i_shared_lock); > + vmaring = mapping->i_mmap; Here, we loop the mapping ring. > + do { > + unsigned long vaddr = vmaring->vm_start + offset; > + unsigned long start; > + > + /* Do not mistake ourselves as another mapping. */ > + if (vmaring == vma) > + continue; > + > + printk("Found a shared mapping\n"); Yes, we found the shared mapping. > + if ((vaddr ^ address) & 0x3000) { > + printk("Found an synonym problem\n"); > + alias_found++; Yup, we found the synonym (in terms of SuperH. In general "alias"). > + start = vmaring->vm_start; > + /* If this is already in the cache flush it, > + * and remove any existing TLB entries. */ > + while (start < vmaring->vm_end) { But why we loop from the start of the area (and to the end) to fix up the pte-s? Isn't that fixing up the relevant one-page enough? Even if we fixed the pte-s of the counter parts of the area, another TLB fault of different page will come here again. > + pgdp = pgd_offset(vmaring->vm_mm, start); > + if (!pgdp) > + goto next; > + ptep = pte_offset((pmd_t *) pgdp, start); > + if (!ptep) > + goto next; > + > + if (pte_val(*ptep) & _PAGE_PRESENT) { > + printk("..and the page is present\n"); > + flush_cache_page(vmaring, start); > + *ptep = __pte(pte_val(*ptep) & > + (~ _PAGE_CACHABLE)); > + flush_tlb_page(vmaring, start); > + } > + next: > + start += PAGE_SIZE; > + } In my understanding, I think that this loop is questionable. > + } > + } while ((vmaring = vmaring->vm_next_share) != NULL); > + spin_unlock(&mapping->i_shared_lock); > + > + if (alias_found && (pte_val(pte) & _PAGE_CACHABLE)) { > + printk("updating PTE\n"); > + pgdp = pgd_offset(vma->vm_mm, address); > + ptep = pte_offset((pmd_t *) pgdp, address); > + *ptep = __pte(pte_val(*ptep) & (~ _PAGE_CACHABLE)); > + pte = *ptep; > + } > + } > +} > +#endif > + -- |
From: NIIBE Y. <gn...@ch...> - 2000-06-24 02:05:15
|
Stuart Menefy wrote: > This also got me thinking about another potential problem. In > update_mmu_cache, the new TLB entry is copied into the TLB using the > LDTLB instruction. This uses the URC bits of the MMUCR register to > select the TLB entry to replace, so in effect the TLB replaced is > selected at random. If however there is already a TLB entry for this > page, then this could cause two TLB entries to refer to the same page, > and cause a 'Multiple Hit Exception' when the page is next accessed. I > guess the fact that we are not seeing these means that it is not a > problem, but I've not convinced myself that it can never occur. > > If we have to, it would be pretty easy to fix, effectivly perform > a flush_tlb_page before loading the new entry, but this messy. Is there any case where update_mmu_cache is called with valid entry on TLB? As far as I know, there's no such case. At least, flush_tlb_page is called within mm/memory.c:establish_pte, before update_mmu_cache. -- |
From: kaz K. <kk...@rr...> - 2000-06-24 03:00:18
|
Stuart Menefy <Stu...@st...> wrote: > This was what I thought, until I started looking at it yesterday. > You're right, it has the same synonym problem, due to cache size vs. > page size issues. However there is one important difference, the > hardware detects the synonym, and raises an exception when it would be > created. If this exception means the MIPS virtual coherency exception, it can not detect the cache alias in chips with the primary cache only, like R4000PC. Perhaps SH-4 is corresponding to such chip, though I don't know about the newer MIPS chips. kaz |
From: Stuart M. <Stu...@st...> - 2000-06-26 16:44:20
|
Niibe-san That I'll teach me to copy code without fully understanding it! On further investigation I think it is needed, but I'd be delighted to be proved wrong. As I see it, there are two reasons for doing this loop: - to flush the cache and TLB of any aliases which may already have been read before the page was shared. I was hoping the generic code would already have done this, but after a few experiments, it doesn't appear to be doing so. I guess at the point the mmap() is performed, the PTE's are not modified, so the flush is not required. - to mark the pte's of any pages which have already been allocated as uncacheable. Your're right, at the moment because the TLB entry will have been flushed, we will always come here again as part of the TLB miss exception, and so we don't need to modify the pte's here. At some point in the future we will probably want to try and optimise the TLB miss code for the common case of the pte already being set up, at which time this code or something similar may be required. If this is right, then the loop is only required when the sharing is first set up, so should be optimisable, but I think it is required. Stuart On Jun 24, 10:56am, gn...@ch... wrote: > Subject: Re: [linuxsh-dev] Caches question > > Stuart Menefy wrote: > > You're right, it has the same synonym problem, due to cache size vs. > > page size issues. However there is one important difference, the > > hardware detects the synonym, and raises an exception when it would be > > created. > > Interesting point. Yes, we can find the similarity between MIPS and > SuperH, but there're the differences we should watch out. > > > This has advantages and disadvantages: > > - the advantage is that, although there is a performance penalty, > > there sould be no risk of errors caused by synonyms > > - however it does prevent legitimate synonyms, for example when all > > the mappings are read only. > > > > SH4 doesn't have this detection capability, so we have to make sure > > we catch any problems early. > > Yes. > > > The other point at which synonyms can occur is when a page is explicitly > > mapped at multiple addresses in user space. So I had a go at writing a > > simple test case for this, and then fixing the problem. Attached is my > > test code, and a potential fix. This is taken almost unchanged from > > the Sparc sun4c code, which has a completely virtual cache. Its horrible, > > but I don't see any alternative, unless we can restrict the addresses > > at which shared pages can be mapped. > > OK. The point is that: > When there's shared mapping which causes the alias (synonym) > we fix-up the pte(s) so that the data should not on the caches. > Specifically, on the TLB fault of the page (of WRITE&&SHARED), > we fix-up the all the pte as uncachablie page. > > > void update_mmu_cache(struct vm_area_struct * vma, > > unsigned long address, pte_t pte) > > { > > @@ -276,6 +352,11 @@ > > unsigned long pteaddr; > > > > save_and_cli(flags); > > + > > +#ifdef __SH4__ > > + if ((vma->vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)) > > + synonym_fixup(vma, address, pte); > > +#endif > > > > /* Set PTEH register */ > > if (vma) { > > This part, agreed. > > There's a point which I can't understand. > > > diff -u -r1.11 fault.c > > --- arch/sh/mm/fault.c 2000/05/18 09:34:18 1.11 > > +++ arch/sh/mm/fault.c 2000/06/23 17:58:48 > > @@ -268,6 +268,82 @@ > > goto no_context; > > } > > > > +#ifdef __SH4__ > > +/* There are really two cases of aliases to watch out for, and these > > + * are: > > + * > > + * 1) A user's page which can be aliased with the kernels virtual > > + * mapping of the physical page. > > + * > > + * 2) Multiple user mappings of the same inode/anonymous object > > + * such that two copies of the same data for the same phys page > > + * can live (writable) in the cache at the same time. > > + * > > + * We handle number 1 by flushing the kernel copy of the page always > > + * after COW page operations. > > + */ > > +static void synonym_fixup(struct vm_area_struct *vma, unsigned long address, pte_t pte) > > +{ > > + pgd_t *pgdp; > > + pte_t *ptep; > > + > > + if (vma->vm_file) { > > + struct address_space *mapping; > > + unsigned long offset = (address & PAGE_MASK) - vma->vm_start; > > + struct vm_area_struct *vmaring; > > + int alias_found = 0; > > + > > + mapping = vma->vm_file->f_dentry->d_inode->i_mapping; > > + spin_lock(&mapping->i_shared_lock); > > + vmaring = mapping->i_mmap; > > Here, we loop the mapping ring. > > > + do { > > + unsigned long vaddr = vmaring->vm_start + offset; > > + unsigned long start; > > + > > + /* Do not mistake ourselves as another mapping. */ > > + if (vmaring == vma) > > + continue; > > + > > + printk("Found a shared mapping\n"); > > Yes, we found the shared mapping. > > > + if ((vaddr ^ address) & 0x3000) { > > + printk("Found an synonym problem\n"); > > + alias_found++; > > Yup, we found the synonym (in terms of SuperH. In general "alias"). > > > + start = vmaring->vm_start; > > + /* If this is already in the cache flush it, > > + * and remove any existing TLB entries. */ > > + while (start < vmaring->vm_end) { > > But why we loop from the start of the area (and to the end) to fix up the > pte-s? Isn't that fixing up the relevant one-page enough? Even if we > fixed the pte-s of the counter parts of the area, another TLB fault of > different page will come here again. > > > + pgdp = pgd_offset(vmaring->vm_mm, start); > > + if (!pgdp) > > + goto next; > > + ptep = pte_offset((pmd_t *) pgdp, start); > > + if (!ptep) > > + goto next; > > + > > + if (pte_val(*ptep) & _PAGE_PRESENT) { > > + printk("..and the page is present\n"); > > + flush_cache_page(vmaring, start); > > + *ptep = __pte(pte_val(*ptep) & > > + (~ _PAGE_CACHABLE)); > > + flush_tlb_page(vmaring, start); > > + } > > + next: > > + start += PAGE_SIZE; > > + } > > In my understanding, I think that this loop is questionable. > > > + } > > + } while ((vmaring = vmaring->vm_next_share) != NULL); > > + spin_unlock(&mapping->i_shared_lock); > > + > > + if (alias_found && (pte_val(pte) & _PAGE_CACHABLE)) { > > + printk("updating PTE\n"); > > + pgdp = pgd_offset(vma->vm_mm, address); > > + ptep = pte_offset((pmd_t *) pgdp, address); > > + *ptep = __pte(pte_val(*ptep) & (~ _PAGE_CACHABLE)); > > + pte = *ptep; > > + } > > + } > > +} > > +#endif > > + > -- > > _______________________________________________ > linuxsh-dev mailing list > lin...@li... > http://lists.sourceforge.net/mailman/listinfo/linuxsh-dev >-- End of excerpt from gn...@ch... |
From: NIIBE Y. <gn...@ch...> - 2000-06-30 07:44:16
|
Stuart Menefy wrote: > That I'll teach me to copy code without fully understanding it! On further > investigation I think it is needed, but I'd be delighted to be proved > wrong. Well, I'm really confusing about the code. I think that the code sometimes works, but totally wrong. Let's clarify things and discuss. (1) When do we need to fixup? (2) How can we find corresponding mapping? (3) How can we fix? Let's discuss one by one. (1) When do we need to fixup? When there is shared mapping which causes the alias (synonym). We handle it at the TLB miss handling, finding the page with flags VM_WRITE and VM_SHARED. In this case, there is possibility that the page is shared among different virtual mappings (which may or may not the map of the same process) and page causes aliases (same physical memory but different cache lines). (2) How can we find corresponding mapping which shares the physical page? # Here is, my confusing point. I don't think the sun4c implementation # handles the cases correctly. Suppose we have virtual memory mapping (say, VMA) of the process. In this case we scan VMA through "vm_next_share" link. We need to tangle the link only if VMA->vm_file !=0 (??is this correct??, anonymous page?). Following the link "vm_next_share", we scan through all the mapping of the file, and find the coresspoinding mapping which shares same physical memory. We need to check vm_pgoff to find matching mapping. (3) How can we fix? There're some alternatives. (a) Flushing cache for the page and TLB which may point that physical page. Then set the flag of matching mapping VM_WRITE|VM_SHARED even if it was not set, so that using this map will cause TLB miss, and come this fixup routine. No PTE handling is required. (b) Make each PTEs uncache-able so that we don't cache any data. Comments? Thoughts? -- |
From: Stuart M. <Stu...@st...> - 2000-06-26 16:47:22
|
On Jun 24, 11:01am, gn...@ch... wrote: > Subject: Re: [linuxsh-dev] Caches question > > Stuart Menefy wrote: > > This also got me thinking about another potential problem. In > > update_mmu_cache, the new TLB entry is copied into the TLB using the > > LDTLB instruction. This uses the URC bits of the MMUCR register to > > select the TLB entry to replace, so in effect the TLB replaced is > > selected at random. If however there is already a TLB entry for this > > page, then this could cause two TLB entries to refer to the same page, > > and cause a 'Multiple Hit Exception' when the page is next accessed. I > > guess the fact that we are not seeing these means that it is not a > > problem, but I've not convinced myself that it can never occur. > > > > If we have to, it would be pretty easy to fix, effectivly perform > > a flush_tlb_page before loading the new entry, but this messy. > > Is there any case where update_mmu_cache is called with valid entry on > TLB? As far as I know, there's no such case. That's what I was hoping, I wasn't sure. If that's the case, good! Stuart |