From: Paul M. <pau...@ti...> - 2003-02-14 13:47:56
Attachments:
sh4-switch_mm-flush.diff
mmaptest.c
|
I've hit what looks to be another cache bug (looks like its an alias issue), and so far it seems to hit both SH-3 and 4. (I haven't tested SH-2 or 5 yet). The problem pops up when doing an mmap() of /dev/zero and then reading from it, the read is sometimes 0, but most of the time ends up getting back garbage and promptly segfaulting. I've attached my testcode for this as well. The fault happens right on the read. I've managed to fix this on SH-4 (patch attached) by doing an all out flush_cache_all() after the activate_context() in switch_mm() .. not a very optimal solution, but seems to do the right thing for now. Unfortunately this same fix doesn't help SH-3 any (after implicitly wrapping flush_cache_all() to cache_wback_all()). Anyone seen this before? Regards, -- Paul Mundt <pau...@ti...> TimeSys Corporation |
From: Stuart M. <stu...@st...> - 2003-02-14 14:55:04
Attachments:
kernel-2.4.18-shmem-cache.patch
|
Hi Paul This looks like a problem I hit a short while ago when using ssh. Turns out to be a problem in the generic kernel code, which is using a clear_highpage() to zero the page, which doesn't have the necessary extra cache alias prevention code. Ideally it should be replaced by clear_user_page(), but the virtual address where the page is going to be mapped isn't available at that point in the code. At this point I had a look in Marcello's tree in BK, and it turns out somebody else has already fixed this problem by simply adding a call to flush_dcache_page(). IIRC this is a post 2.4.20 change. I'm running with the attached patch to a 2.4.18 tree, and your test code appears to work for me on an SH4, I've not tried anything else. Hope this helps Stuart On Fri, 14 Feb 2003 08:47:07 -0500 pau...@ti... wrote: > I've hit what looks to be another cache bug (looks like its an alias > issue), and so far it seems to hit both SH-3 and 4. (I haven't tested > SH-2 or 5 yet). > > The problem pops up when doing an mmap() of /dev/zero and then reading > from it, the read is sometimes 0, but most of the time ends up getting > back garbage and promptly segfaulting. > > I've attached my testcode for this as well. The fault happens right on > the read. > > I've managed to fix this on SH-4 (patch attached) by doing an all out > flush_cache_all() after the activate_context() in switch_mm() .. not a > very optimal solution, but seems to do the right thing for now. > > Unfortunately this same fix doesn't help SH-3 any (after implicitly > wrapping flush_cache_all() to cache_wback_all()). > > Anyone seen this before? > > Regards, > > -- > Paul Mundt <pau...@ti...> > TimeSys Corporation > |
From: David M. <da...@sn...> - 2003-02-14 15:02:23
|
Hi Paul, I just tried this on my 2.4.17 kernel on the 7751R/SH4 with no crash and it output 0 as I suspect it is supposed to. I can try some more platforms/combos monday when I am back in the office if you like. Not sure that it worked for me means much though, could be just be the cache state/entries are quite different between our platforms. Cheers, Davidm Jivin Paul Mundt lays it down ... > I've hit what looks to be another cache bug (looks like its an alias > issue), and so far it seems to hit both SH-3 and 4. (I haven't tested > SH-2 or 5 yet). > > The problem pops up when doing an mmap() of /dev/zero and then reading > from it, the read is sometimes 0, but most of the time ends up getting > back garbage and promptly segfaulting. > > I've attached my testcode for this as well. The fault happens right on > the read. > > I've managed to fix this on SH-4 (patch attached) by doing an all out > flush_cache_all() after the activate_context() in switch_mm() .. not a > very optimal solution, but seems to do the right thing for now. > > Unfortunately this same fix doesn't help SH-3 any (after implicitly > wrapping flush_cache_all() to cache_wback_all()). > > Anyone seen this before? > > Regards, > > -- > Paul Mundt <pau...@ti...> > TimeSys Corporation -- David McCullough: Ph: +61 7 3435 2815 http://www.SnapGear.com da...@sn... Fx: +61 7 3891 3630 Custom Embedded Solutions + Security |
From: David M. <da...@sn...> - 2003-02-14 15:09:28
|
Jivin Paul Mundt lays it down ... > David, > > Interesting. I've been testing on my Solution Engines .. 7709A and 7750. > The 7751R has a vastly different cache configuration. Does your I wouldn't say the 7751/7750 and the 7751R are that different code wise to manage it, just the extra ways to deal with. Of course underneath they may be plenty different ;-) > arch/sh/mm/cache-sh4.c properly deal with the multiple ways? Yes. I would have tried it on a 7751 but it's net chip has gone west and I am working from home (friday night), so I won't be able to try it till monday. > I've also only tested this using write-back. I suppose I should give > write-through a try as well, but since it's mostly a reading issue, I > doubt that'll have much of an effect. I am using copyback/write back. > Also, are you running stock LinuxSH CVS? or are you on a proprietary > vendor tree that might have some other flushing happening somewhere? I am running a mostly stock Linus kernel with patches for our targets. Actually, I just tried it again in compatibility mode (ie., stock 7751), also worked fine. The cache-sh4.c I used in the test is a mega hack version I used to get multiple ways going (with experimentation code still in there). I can send you a copy if you want. It currently has the 7751R support hardcoded on, but the current version has it disabled, not that it seems to matter in practice if you use the 7751R code on the 7751. Cheers, Davidm > On Fri, 2003-02-14 at 09:36, David McCullough wrote: > > I just tried this on my 2.4.17 kernel on the 7751R/SH4 with no crash > > and it output 0 as I suspect it is supposed to. I can try some more > > platforms/combos monday when I am back in the office if you like. > > > > Not sure that it worked for me means much though, could be just be the > > cache state/entries are quite different between our platforms. > > > > Regards, > > -- > Paul Mundt <pau...@ti...> > TimeSys Corporation -- David McCullough: Ph: +61 7 3435 2815 http://www.SnapGear.com da...@sn... Fx: +61 7 3891 3630 Custom Embedded Solutions + Security |
From: Paul M. <pau...@ti...> - 2003-02-14 15:18:28
|
On Fri, 2003-02-14 at 10:17, David McCullough wrote: > I wouldn't say the 7751/7750 and the 7751R are that different code wise > to manage it, just the extra ways to deal with. Of course underneath > they may be plenty different ;-) >=20 Extra ways makes a considerable difference in and of itself. Naturally there's also OC RAM differences, which is a bit more annoying. I've been playing with getting most of that dealt with relatively transparently in the restructure branch, but haven't finished off the way bit selection in the flush yet (also no 7751R/7750R to test on).=20 Good thing probing works so we don't have to hardcode anything.. > The cache-sh4.c I used in the test is a mega hack version I used to get > multiple ways going (with experimentation code still in there). I can > send you a copy if you want. It currently has the 7751R support > hardcoded on, but the current version has it disabled, not that it > seems to matter in practice if you use the 7751R code on the 7751. >=20 Sure, pass it along, could be interesting. --=20 Paul Mundt <pau...@ti...> TimeSys Corporation |