From: Rodrigo S. de C. <rc...@im...> - 2001-12-12 14:09:12
|
Hi, [Jeff, I talked to you yesterday about this problem] In my code, I use a virtual swap address, which is nothing but an ordinary swap entry, but its type is MAX_SWAPFILES. This address is assigned to a pte in try_to_swap_out(), like an usual swap entry and that's _all_ (there some other stuff wrt to these ptes, but I removed). It should work as usual swap addresses do work, right? But it does not, at least under UML (on a real machine it does). The problem I noticed is that, when setting a pte to a swap entry, the NEWPAGE bit is turned on to tell fix_range() that this address should be unmapped. It turns out that, for swap adresses that are shared by many ptes, the address is not unmapped and the NEWPAGE bit is not turned off soon enough before any pte faults. Then, in fault handlers (handle_pte_fault() and do_swap_page()), the pte has a wrong value, so I can't find the page in swap cache. For example, if the swap cache page is set to 0x3307c and the wrong pte is set to 0x3307e (due to newpage bit), I won't be able to find it. In _switch_to, everytime we swith to a new task, you flush its tlb (flush_tlb_all()), so there shouldn't have this problem. An important question before bother you more: do you think that it's UML-related or it's unlikely? I added a printk to fix_range when fixing my virtual swap entries and the problem was gone. Will it be a race condition? I am not doing anything but setting the pte in try_to_swap_out and handling faults in do_swap_page(), so I have no clue. :-( The same code was working in the old 2.4.10, when the newpage bit was not set for swap entries. Regards, -- Rodrigo S. de Castro <rc...@im...> |
From: Rodrigo S. de C. <rc...@im...> - 2001-12-13 18:15:31
|
On Wed, Dec 12, 2001 at 12:07:15PM -0200, Rodrigo Souza de Castro wrote: > The problem I noticed is that, when setting a pte to a swap entry, the > NEWPAGE bit is turned on to tell fix_range() that this address should > be unmapped. It turns out that, for swap adresses that are shared by > many ptes, the address is not unmapped and the NEWPAGE bit is not > turned off soon enough before any pte faults. Then, in fault handlers > (handle_pte_fault() and do_swap_page()), the pte has a wrong value, so > I can't find the page in swap cache. For example, if the swap cache > page is set to 0x3307c and the wrong pte is set to 0x3307e (due to > newpage bit), I won't be able to find it. By the way, the test that fails is mtest from Memtest suite. Today I made mtest fail when running on a vanilla 2.4.16 + 2um patch, but only once. :-( Regards, -- Rodrigo S. de Castro <rc...@im...> |
From: Jeff D. <jd...@ka...> - 2002-01-11 03:14:58
|
rc...@im... said: > The problem I noticed is that, when setting a pte to a swap entry, the > NEWPAGE bit is turned on to tell fix_range() that this address should > be unmapped. It turns out that, for swap adresses that are shared by > many ptes, the address is not unmapped and the NEWPAGE bit is not > turned off soon enough before any pte faults. Then, in fault handlers > (handle_pte_fault() and do_swap_page()), the pte has a wrong value, so > I can't find the page in swap cache. For example, if the swap cache > page is set to 0x3307c and the wrong pte is set to 0x3307e (due to > newpage bit), I won't be able to find it. This is an excellent description of the bug that I just spend 5 days chasing. Unfortunately, I didn't until recently (yesterday) understand enough of the swap code to tell whether you were describing something real or whether you were on crack. It is now fixed. pte_to_swp_entry now masks off the arch bits. Jeff |
From: Rodrigo S. de C. <rc...@im...> - 2002-01-11 10:39:39
|
On Thu, Jan 10, 2002 at 10:16:18PM -0500, Jeff Dike wrote: > rc...@im... said: > > The problem I noticed is that, when setting a pte to a swap entry, the > > NEWPAGE bit is turned on to tell fix_range() that this address should > > be unmapped. It turns out that, for swap adresses that are shared by > > many ptes, the address is not unmapped and the NEWPAGE bit is not > > turned off soon enough before any pte faults. Then, in fault handlers > > (handle_pte_fault() and do_swap_page()), the pte has a wrong value, so > > I can't find the page in swap cache. For example, if the swap cache > > page is set to 0x3307c and the wrong pte is set to 0x3307e (due to > > newpage bit), I won't be able to find it. Hi Jeff, > This is an excellent description of the bug that I just spend 5 days > chasing. > > Unfortunately, I didn't until recently (yesterday) understand enough > of the swap code to tell whether you were describing something real > or whether you were on crack. Yesterday Livio and I started chasing this bug because UML is still much better for us to develop than real machines :-) But we were trying to understand how your flush architecture worked first, thus couldn't make any progress. > It is now fixed. It's great to hear you fixed it. > pte_to_swp_entry now masks off the arch bits. I thought of something like that (masking off the bits in do_swap_page()), but wasn't sure if it could possibly break the code. Besides that, it's only partialy correct, because many places that call pte_to_swp_entry() would still be broken. Regards, -- Rodrigo S. de Castro <rc...@im...> |
From: Jeff D. <jd...@ka...> - 2001-12-13 19:28:14
|
> It turns out that, for swap adresses that are shared by many ptes, the > address is not unmapped and the NEWPAGE bit is not turned off soon > enough before any pte faults. By shared, you mean that a number of different ptes have the same value? I can't see how that would cause any problems. When a pte is examined to see if it needs the address space to be updated, the values in other ptes don't matter at all. > Then, in fault handlers > (handle_pte_fault() and do_swap_page()), the pte has a wrong value, so > I can't find the page in swap cache. For example, if the swap cache > page is set to 0x3307c and the wrong pte is set to 0x3307e Why is your fault handler looking at those bits anyway? If you're calling SWP_ENTRY, they're stripped out. > In _switch_to, everytime we swith to a new task, you flush its tlb > (flush_tlb_all()), so there shouldn't have this problem. Are you sure that it's going through a context switch before you see problems? > An important > question before bother you more: do you think that it's UML-related or > it's unlikely? I consider that code to be fairly stable. I haven't seen any problems in it lately, so my guess would be that you're messing it up somehow :-) > By the way, the test that fails is mtest from Memtest suite. Today I > made mtest fail when running on a vanilla 2.4.16 + 2um patch, but only > once. :-( Hmmm, I'll have a look at this... It might contradict that guess I just made... Jeff |
From: Rodrigo S. de C. <rc...@im...> - 2001-12-13 19:57:04
|
On Thu, Dec 13, 2001 at 03:48:12PM -0500, Jeff Dike wrote: > > It turns out that, for swap adresses that are shared by many ptes, the > > address is not unmapped and the NEWPAGE bit is not turned off soon > > enough before any pte faults. > > By shared, you mean that a number of different ptes have the same > value? Yes. > I can't see how that would cause any problems. When a pte is > examined to see if it needs the address space to be updated, the > values in other ptes don't matter at all. I agree with you. I told you about being shared because all the times it failed, the swap entry of the failed entry had a counter greater than 1 and this info could help you somehow. > > Then, in fault handlers > > (handle_pte_fault() and do_swap_page()), the pte has a wrong value, so > > I can't find the page in swap cache. For example, if the swap cache > > page is set to 0x3307c and the wrong pte is set to 0x3307e > > Why is your fault handler looking at those bits anyway? If you're calling > SWP_ENTRY, they're stripped out. When you unmap a page, the page is added to the swap cache. If the pte faults this address before the page is swapped out, we map this pte back to the swap cache page, so we don't convert the swap entry address to disk blocks to read them. Let me give you an example of what gets messed up with this whole problem I pointed out. try_to_swap_out 1. pte is unmapped 2. the 0x3307c swap entry is returned by get_swap_page() 3. set_pte sets this pte to 0x3307e = 0x3307c | 0x2 (newpage bit) 4. the page that this pte was previously set is added to swap cache indexed by the --> 0x3307c <-- address, ie, page->index = 0x3307c. do_swap_page 1. the pte has 0x3307e and the swap cache page has not yet been swapped out 2. we search with lookup_swap_cache for a page whose index value is --> 0x3307e <--, ie, page->index == 0x3307e. 3. the search returns nothing, as expected 4. we read this page from disk. The correct block on disk is computed correctly since SWP_TYPE and SWP_OFFSET macros strips out those last two bits, but it returns _bogus_ data. In my case, since the virtual swap address is set to a swap type which has no backing storage, once not found in swap cache, it panics. > > In _switch_to, everytime we swith to a new task, you flush its tlb > > (flush_tlb_all()), so there shouldn't have this problem. > > Are you sure that it's going through a context switch before you see > problems? Only user-level apps have ptes set to swap entries, so in order to one of these ptes to fault an address, it must have run for a while. Thus the context was switched to this user-level application. Since the tlb is flushed for every new application we switch to, this newpage bit should have been cleaned. Am I missing something here? > > An important > > question before bother you more: do you think that it's UML-related or > > it's unlikely? > > I consider that code to be fairly stable. I haven't seen any > problems in it lately, so my guess would be that you're messing it > up somehow :-) I may be doing that. :-( If it's an error of mine, I hope you don't spend too much time looking for nonexistent uml bug. > > By the way, the test that fails is mtest from Memtest suite. Today I > > made mtest fail when running on a vanilla 2.4.16 + 2um patch, but only > > once. :-( > > Hmmm, I'll have a look at this... It might contradict that guess I just > made... Thank you, -- Rodrigo S. de Castro <rc...@im...> |
From: Jeff D. <jd...@ka...> - 2001-12-13 20:44:39
|
rc...@im... said: > 4. the page that this pte was previously set is added to swap cache > indexed by the --> 0x3307c <-- address, ie, page->index = 0x3307c. This seems like a bug. The low pte bits are arch-specific, and as far as you're concerned, they should be considered random. The arch can fiddle those bits at any time for any reason. Why aren't you using page->index == SWP_OFFSET(pte)? > 4. we read this page from disk. The correct block on disk is computed > correctly since SWP_TYPE and SWP_OFFSET macros strips out those last > two bits, but it returns _bogus_ data. Because the page had never been written out to disk in the first place, I take it? > Only user-level apps have ptes set to swap entries, so in order to one > of these ptes to fault an address, it must have run for a while. At the very least, flush_tlb_page should have been called when the pte was marked as being swapped. That should have unmapped the page and cleared the newpage bit. Jeff |
From: Rodrigo S. de C. <rc...@im...> - 2001-12-13 21:02:47
|
On Thu, Dec 13, 2001 at 05:04:30PM -0500, Jeff Dike wrote: > rc...@im... said: > > 4. the page that this pte was previously set is added to swap cache > > indexed by the --> 0x3307c <-- address, ie, page->index = 0x3307c. > > This seems like a bug. The low pte bits are arch-specific, and as far as > you're concerned, they should be considered random. The arch can fiddle > those bits at any time for any reason. Why aren't you using > page->index == SWP_OFFSET(pte)? That's not my code, it's try_to_swap_out() (mm/vmscan.c) and add_to_swap_cache() (mm/swap_state.c) that do that. We can't use page->index == SWP_OFFSET(pte) because we have to take into account the swap type too (for instance, if we have many swap files or partitions) in order to have an unique index. > > 4. we read this page from disk. The correct block on disk is computed > > correctly since SWP_TYPE and SWP_OFFSET macros strips out those last > > two bits, but it returns _bogus_ data. > > Because the page had never been written out to disk in the first place, I > take it? That's right. > > Only user-level apps have ptes set to swap entries, so in order to one > > of these ptes to fault an address, it must have run for a while. > > At the very least, flush_tlb_page should have been called when the pte > was marked as being swapped. That should have unmapped the page and cleared > the newpage bit. I don't know very much of this cache flushing to tell if it's missing something over there. Have a look at try_to_swap_out() and tell me what you think. Regards, -- Rodrigo S. de Castro <rc...@im...> |