From: Masahiro A. <m-...@aa...> - 2001-07-11 07:03:00
|
Hello all, today I desperately need someone's help (as usual;-) I've checked out CVS kernel last night (before Niibe-san updated it to 2.4.7-pre5/6). Today I've made modifications to it to run on our custom board. It booted, but somehow, system is reset while INIT process. I have applied changes to the source for irq, machvec, CompactFlash handling only. I haven't applied ext3 and RTLinux patch yet. System reset occurs at random location of INIT process. Sometimes I can see several lines of output from rc.sysinit. Sometimes I can see nothing after "Freeing unused kernel memory" message from kernel, just reset happens. If I make /sbin/init as link to /bin/bash, sometimes I can see bash prompt and type some command, but sometimes system resets before that. Our custom board has 32MB SRAM in area 2, CompactFlash in area 5. With 2.4.5 kernel and similar modifications, our custom board can run without problem. I've tried same kernel on SolutionEngine 7750S. It has no problem like this. Can anybody suggest me something/some place to look into? I can't think of anything how to nail down this problem now. For reference, I've attached patch for CompactFlash handling. This is basically the same as what I've posted before, but uses newly renamed p3_ioremap(). This one or "irq_maskreg" patch I've posted before may be the source of this? Thanks in advance for your help. --------- diff -ruN linux-2.4.6-cvs-mr/arch/sh/config.in linux-2.4.6-cvs-mr-cf/arch/sh/config.in --- linux-2.4.6-cvs-mr/arch/sh/config.in Wed Jul 11 09:40:56 2001 +++ linux-2.4.6-cvs-mr-cf/arch/sh/config.in Wed Jul 11 10:45:10 2001 @@ -130,6 +130,18 @@ bool 'Compact Flash Enabler support' CONFIG_CF_ENABLER fi +if [ "$CONFIG_CF_ENABLER" = "y" ]; then + choice 'Compact Flash Area' \ + "Area5 CONFIG_CF_AREA5 \ + Area6 CONFIG_CF_AREA6" Area6 + if [ "$CONFIG_CF_AREA5" = "y" ]; then + define_hex CONFIG_CF_BASE_ADDR b4000000 + fi + if [ "$CONFIG_CF_AREA6" = "y" ]; then + define_hex CONFIG_CF_BASE_ADDR b8000000 + fi +fi + bool 'Hitachi HD64461 companion chip support' CONFIG_HD64461 if [ "$CONFIG_HD64461" = "y" ]; then int 'HD64461 IRQ' CONFIG_HD64461_IRQ 36 diff -ruN linux-2.4.6-cvs-mr/arch/sh/kernel/cf-enabler.c linux-2.4.6-cvs-mr-cf/arch/sh/kernel/cf-enabler.c --- linux-2.4.6-cvs-mr/arch/sh/kernel/cf-enabler.c Wed Jul 11 09:40:56 2001 +++ linux-2.4.6-cvs-mr-cf/arch/sh/kernel/cf-enabler.c Wed Jul 11 14:20:41 2001 @@ -14,7 +14,8 @@ #include <asm/io.h> #include <asm/irq.h> -#define CF_CIS_BASE 0xb8000000 +/* this must be done in boot-loader - Masahiro Abe +#define CF_CIS_BASE 0xb8000000 */ /* * You can connect Compact Flash directly to the bus of SuperH. * This is the enabler for that. @@ -29,15 +30,58 @@ * 0xB8001000 : Common Memory * 0xBA000000 : I/O */ +#if defined(CONFIG_IDE) && defined(__SH4__) +/* SH4 can't access PCMCIA interface through P2 area. + * we must remap it with appropreate attribute bit of the page set. + * this part is based on Greg Banks' hd64465_ss.c implementation - Masahiro Abe */ +#include <linux/mm.h> +#include <linux/vmalloc.h> + +#if defined(CONFIG_CF_AREA6) +#define slot_no 0 +#else +#define slot_no 1 +#endif + +extern void * p3_ioremap(unsigned long phys_addr, unsigned long size, unsigned long flags); + +void *cf_io_base; + +static int __init allocate_cf_area(void) +{ + pgprot_t prot; + unsigned long paddrbase, psize; + +/* open I/O area window */ + paddrbase = virt_to_phys((void*)CONFIG_CF_BASE_ADDR); + psize = PAGE_SIZE; + prot = PAGE_KERNEL_PCC(slot_no, _PAGE_PCC_IO16); + cf_io_base = p3_ioremap(paddrbase, psize, prot.pgprot); + if (!cf_io_base) { + printk("allocate_cf_area : can't open CF I/O window!\n"); + return -ENOMEM; + } +/* printk("p3_ioremap(paddr=0x%08lx, psize=0x%08lx, prot=0x%08lx)=0x%08lx\n", + paddrbase, psize, prot.pgprot, cf_io_base);*/ + + /* XXX : do we need attribute and common-memory area also? */ + + return 0; +} +#endif static int __init cf_init_default(void) { #ifdef CONFIG_IDE +#if defined(CONFIG_IDE) && defined(__SH4__) + allocate_cf_area(); +#endif /* Enable the card, and set the level interrupt */ - ctrl_outw(0x0042, CF_CIS_BASE+0x0200); +/* this must be done in boot-loader - Masahiro Abe + ctrl_outw(0x0042, CF_CIS_BASE+0x0200);*/ #endif - make_imask_irq(14); - disable_irq(14); +/* make_imask_irq(14); + disable_irq(14);*/ return 0; } ---------- +-------------------------------------+ | Masahiro Abe, Software Engineer | | A&D Co., Ltd. of Tokyo, Japan | | mailto:m-...@aa... | +-------------------------------------+ |This is my opinion, not my employer's| +-------------------------------------+ |
From: David M. <Dav...@st...> - 2001-07-11 12:56:32
|
Masahiro-san, We've also found some problems with the cache handling in the latest kernel. Things like X-windows are no longer stable. We haven't yet had a chance to track down the source of these problems, but expect to be able to do soon. Niibe-san, have you seen any problems on your system? I don't think it is specific to your board, I think there is a more generic problem here. Cheers! -- Dave McKay Software Engineer STMicroelectronics Email: dav...@st... |
From: Dustin M. <du...@se...> - 2001-07-11 13:56:34
|
I'm experiencing the same problems as Masahiro-san as well, though I don't get reboots, just a hanging system. I'm working on a custom SH7751 based platform that works fine with the 2.4.5 kernel sources. The latest kernel will hang just as init starts running. Dustin. -----Original Message----- From: lin...@li... [mailto:lin...@li...]On Behalf Of David Mckay Sent: Wednesday, July 11, 2001 5:56 AM To: lin...@li...; m-...@aa... Subject: Re: [linuxsh-dev] System reset while INIT processing Masahiro-san, We've also found some problems with the cache handling in the latest kernel. Things like X-windows are no longer stable. We haven't yet had a chance to track down the source of these problems, but expect to be able to do soon. Niibe-san, have you seen any problems on your system? I don't think it is specific to your board, I think there is a more generic problem here. Cheers! -- Dave McKay Software Engineer STMicroelectronics Email: dav...@st... _______________________________________________ linuxsh-dev mailing list lin...@li... http://lists.sourceforge.net/lists/listinfo/linuxsh-dev |
From: Masahiro A. <m-...@aa...> - 2001-07-12 00:33:02
|
Thank you for your reply, David and Dustin. I'm a little bit relieved that I'm not alone. Additional info (may be related to this matter): Our board has SMC91C96 Ethernet controller. It worked fine with 2.4.5 kernel. With current kernel, the driver hangs in the middle of initialization (in smc_findirq, at outb just after SMC_DELAY). So, I'm temporarily not using it now. Last night I've checked out latest source from CVS (it's 2.4.7-pre6 now), and tried it. As expected, behavior of the system is same as yesterday. I've tried to disable CONFIG_CF_ENABLER and enable SMC91C96, but it hangs in smc_findirq. If I comment "outb" out in smc_findirq, it passes driver initialization. I can see message with probed IRQ, base address, etc. This is with CONFIG_CF_ENABLER enabled. On Wed, 11 Jul 2001 06:55:56 -0700 "Dustin McIntire" <du...@se...> wrote: > I'm experiencing the same problems as Masahiro-san as well, though I don't > get reboots, just a hanging system. I'm working on a custom SH7751 based > platform that works fine with the 2.4.5 kernel sources. The latest kernel > will hang just as init starts running. > > Dustin. > > -----Original Message----- > From: lin...@li... > [mailto:lin...@li...]On Behalf Of David Mckay > Sent: Wednesday, July 11, 2001 5:56 AM > To: lin...@li...; m-...@aa... > Subject: Re: [linuxsh-dev] System reset while INIT processing > > > Masahiro-san, > We've also found some problems with the cache handling in the latest > kernel. Things like X-windows are no longer stable. We haven't yet had a > chance > to track down the source of these problems, but expect to be able to do > soon. > Niibe-san, have you seen any problems on your system? > > I don't think it is specific to your board, I think there is a more generic > problem here. > > Cheers! +-------------------------------------+ | Masahiro Abe, Software Engineer | | A&D Co., Ltd. of Tokyo, Japan | | mailto:m-...@aa... | +-------------------------------------+ |This is my opinion, not my employer's| +-------------------------------------+ |
From: NIIBE Y. <gn...@m1...> - 2001-07-12 04:17:19
|
David Mckay wrote: > Niibe-san, have you seen any problems on your system? Not for my SolutionEngine SH7750 or SH7750S. I've tried for my CqREEK SH7750, I've seen the issue. Then, I've just committed a patch to fix cache issues. It is not the cache alias issue, but the cache coherency issue between I-cache and D-cache. Please try it out and let me how it goes (well or worth). -- |
From: Masahiro A. <m-...@aa...> - 2001-07-12 05:56:17
|
On Thu, 12 Jul 2001 13:17:14 +0900 (JST) NIIBE Yutaka <gn...@m1...> wrote: > Then, I've just committed a patch to fix cache issues. It is not the > cache alias issue, but the cache coherency issue between I-cache and > D-cache. Please try it out and let me how it goes (well or worth). Thank you Niibe-san. I've tried this, and found that it's got a little bit better. Still system reboots in INIT, but it can stand relatively longer than before. I can see output from /etc/rc.d/rc and S** sometimes (not always). I couldn't see it before this patch. +-------------------------------------+ | Masahiro Abe, Software Engineer | | A&D Co., Ltd. of Tokyo, Japan | | mailto:m-...@aa... | +-------------------------------------+ |This is my opinion, not my employer's| +-------------------------------------+ |
From: NIIBE Y. <gn...@m1...> - 2001-07-12 06:32:38
|
Masahiro Abe wrote: > I've tried this, and found that it's got a little bit better. Still > system reboots in INIT, but it can stand relatively longer than before. > I can see output from /etc/rc.d/rc and S** sometimes (not always). I > couldn't see it before this patch. Now, could you please try it again. Because of jet-rag, the one I've committed include a bug. I believe I've fixed. -- |
From: Masahiro A. <m-...@aa...> - 2001-07-12 06:52:58
|
On Thu, 12 Jul 2001 15:32:33 +0900 (JST) NIIBE Yutaka <gn...@m1...> wrote: > Masahiro Abe wrote: > > I've tried this, and found that it's got a little bit better. Still > > system reboots in INIT, but it can stand relatively longer than before. > > I can see output from /etc/rc.d/rc and S** sometimes (not always). I > > couldn't see it before this patch. >=20 > Now, could you please try it again. Because of jet-rag, the one I've > committed include a bug. I believe I've fixed. No, it didn't work. It's got relatively backward, seldom see the message =66rom INIT. +-------------------------------------+ | Masahiro Abe, Software Engineer | | A&D Co., Ltd. of Tokyo, Japan | | mailto:m-...@aa... | +-------------------------------------+ |This is my opinion, not my employer's| +-------------------------------------+ |
From: NIIBE Y. <gn...@m1...> - 2001-07-12 07:22:40
|
Masahiro Abe wrote: > No, it didn't work. It's got relatively backward, seldom see the message > from INIT. Then, what you see is another bug. In my environments (CqREEK & SolutionEngine), it works fine now. The email attached may be related to your issue. ------- start of forwarded message (RFC 934 encapsulation) ------- Content-Length: 1814 Message-ID: <20010711175809.F3496@athlon.random> References: <200107110849.f6B8nlm00414@df1tlpc.local.here> <shs...@ch...> <3B4...@uo...> <151...@ch...> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <151...@ch...>; from tro...@fy... on Wed, Jul 11, 2001 at 04:22:04PM +0200 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc Precedence: bulk X-Mailing-List: lin...@vg... From: Andrea Arcangeli <an...@su...> Sender: lin...@vg... To: Trond Myklebust <tro...@fy...> Cc: Andrew Morton <an...@uo...>, Klaus Dittrich <kl...@t-...>, Linus Torvalds <tor...@tr...>, lin...@vg... Subject: Re: 2.4.7p6 hang Date: Wed, 11 Jul 2001 17:58:09 +0200 On Wed, Jul 11, 2001 at 04:22:04PM +0200, Trond Myklebust wrote: > >>>>> " " == Andrew Morton <an...@uo...> writes: > > > Trond Myklebust wrote: > >> > >> ... I have the same problem on my setup. To me, it looks like > >> the loop in spawn_ksoftirqd() is suffering from some sort of > >> atomicity problem. > > > Does a `set_current_state(TASK_RUNNING);' in spawn_ksoftirqd() > > fix it? If so we have a rogue initcall... > > Nope. The same thing happens as before. > > A couple of debugging statements show that ksoftirqd_CPU0 gets created > fine, and that ksoftirqd_task(0) is indeed getting set correctly > before we loop in spawn_ksoftirqd(). > After this the second call to kernel_thread() succeeds, but > ksoftirqd() itself never gets called before the hang occurs. ksoftirqd is quite scheduler intensive, and while its startup is correct (no need of any change there), it tends to trigger scheduler bugs (one of those bugs was just fixed in pre5). The reason I never seen the deadlock I also fixed this other scheduler bug in my tree: ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.7pre5aa1/00_sched-yield-1 this one I forgot to sumbit but here it is now for easy merging: - --- 2.4.4aa3/kernel/sched.c.~1~ Sun Apr 29 17:37:05 2001 +++ 2.4.4aa3/kernel/sched.c Tue May 1 16:39:42 2001 @@ -674,8 +674,10 @@ #endif spin_unlock_irq(&runqueue_lock); - - if (prev == next) + if (prev == next) { + current->policy &= ~SCHED_YIELD; goto same_process; + } #ifdef CONFIG_SMP /* Andrea - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to maj...@vg... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ------- end ------- |
From: Masahiro A. <m-...@aa...> - 2001-07-12 07:50:58
|
On Thu, 12 Jul 2001 16:22:34 +0900 (JST) NIIBE Yutaka <gn...@m1...> wrote: > Masahiro Abe wrote: > > No, it didn't work. It's got relatively backward, seldom see the message > > from INIT. > > Then, what you see is another bug. In my environments (CqREEK & > SolutionEngine), it works fine now. > > The email attached may be related to your issue. Thank you for the info. It seems like not related to my problem. I've tried, had no luck. But it may fix the problem that Dustin is having. +-------------------------------------+ | Masahiro Abe, Software Engineer | | A&D Co., Ltd. of Tokyo, Japan | | mailto:m-...@aa... | +-------------------------------------+ |This is my opinion, not my employer's| +-------------------------------------+ |
From: Masahiro A. <m-...@aa...> - 2001-07-12 13:15:00
|
Sorry for too much posts today. Hopefully this will be today's last one. I've learned how to use JTAG debugger, and managed to get some information at system reboot. This is what I've found: When reboot (PC<-0xa0000000), - SPC points inside memcpy(arch/sh/lib/memcpy.S), but the exact address changes time to time. - EXPEVT is 0x140, which is multiple TLB(either data or instruction) hit exception. - If it means something, both SR and SSR are 0x700000F1 I can't interpret those yet. If anyone have thought about something related, I deadly would like to know. --- Just as small thing that came to my mind. I'm using glibc which is built with kernel 2.4.5 and gcc w/o Kojima-san's ABI change. Could this be the source of the problem? Should I rebuild glibc, or incorporate ABI change for current kernel? I guess those are not needed, but just wanted to make sure. +-------------------------------------+ | Masahiro Abe, Software Engineer | | A&D Co., Ltd. of Tokyo, Japan | | mailto:m-...@aa... | +-------------------------------------+ |This is my opinion, not my employer's| +-------------------------------------+ |
From: kaz K. <kk...@rr...> - 2001-07-12 13:35:07
|
Hi, Masahiro Abe <m-...@aa...> wrote: > Just as small thing that came to my mind. > I'm using glibc which is built with kernel 2.4.5 and gcc w/o > Kojima-san's ABI change. Could this be the source of the problem? Should > I rebuild glibc, or incorporate ABI change for current kernel? > I guess those are not needed, but just wanted to make sure. It affects shared libraries only. kaz |
From: Masahiro A. <m-...@aa...> - 2001-07-12 23:53:57
|
Hello Kojima-san, thank you for your comment. On Thu, 12 Jul 2001 22:39:35 +0900 kaz Kojima <kk...@rr...> wrote: > Hi, > > Masahiro Abe <m-...@aa...> wrote: > > Just as small thing that came to my mind. > > I'm using glibc which is built with kernel 2.4.5 and gcc w/o > > Kojima-san's ABI change. Could this be the source of the problem? Should > > I rebuild glibc, or incorporate ABI change for current kernel? > > I guess those are not needed, but just wanted to make sure. > > It affects shared libraries only. I'm sorry but I can't understand this. Should I update my gcc+glibc for ABI change, and rebuild CVS kernel, glibc, other libraries, and all userland programs? Or Should I update my gcc+glibc for ABI change, and rebuild CVS kernel, glibc and other libraries, but NOT userland programs? Or ABI change is not needed for CVS kernel and glibc? +-------------------------------------+ | Masahiro Abe, Software Engineer | | A&D Co., Ltd. of Tokyo, Japan | | mailto:m-...@aa... | +-------------------------------------+ |This is my opinion, not my employer's| +-------------------------------------+ |
From: kaz K. <kk...@rr...> - 2001-07-13 00:27:02
|
Masahiro Abe <m-...@aa...> wrote: >> It affects shared libraries only. > > I'm sorry but I can't understand this. > > Should I update my gcc+glibc for ABI change, and rebuild CVS kernel, > glibc, other libraries, and all userland programs? > > Or > > Should I update my gcc+glibc for ABI change, and rebuild CVS kernel, > glibc and other libraries, but NOT userland programs? > > Or > > ABI change is not needed for CVS kernel and glibc? You have only to update binutils and glibc. The new ABI changes PLT. So it makes binary imcompatibility on shared libraries and all dynamically linked programs which use PLT. But the new ld.so in glibc can handle old/new objects at the same time and you have not to recompile shared libraries and userland programs at all. Of course, kernel doesn't use PLT and is never affected this ABI change. kaz |
From: Masahiro A. <m-...@aa...> - 2001-07-13 00:51:42
|
On Fri, 13 Jul 2001 09:31:31 +0900 kaz Kojima <kk...@rr...> wrote: > You have only to update binutils and glibc. > > The new ABI changes PLT. So it makes binary imcompatibility on > shared libraries and all dynamically linked programs which use > PLT. But the new ld.so in glibc can handle old/new objects at > the same time and you have not to recompile shared libraries and > userland programs at all. > Of course, kernel doesn't use PLT and is never affected this > ABI change. Thank you. Now I understand that: -My problem is not related to ABI change issue, so I can update binutils and glibc later. -When I need to update, I only have to do it on binutils and glibc. No code change needed for gcc, other libraries, kernel. I don't have to rebuild them, but I can. +-------------------------------------+ | Masahiro Abe, Software Engineer | | A&D Co., Ltd. of Tokyo, Japan | | mailto:m-...@aa... | +-------------------------------------+ |This is my opinion, not my employer's| +-------------------------------------+ |
From: Masahiro A. <m-...@aa...> - 2001-07-13 02:18:46
|
On Thu, 12 Jul 2001 22:14:54 +0900 Masahiro Abe <m-...@aa...> wrote: > Sorry for too much posts today. Hopefully this will be today's last one. > > I've learned how to use JTAG debugger, and managed to get some > information at system reboot. This is what I've found: > When reboot (PC<-0xa0000000), > - SPC points inside memcpy(arch/sh/lib/memcpy.S), but the exact address > changes time to time. > - EXPEVT is 0x140, which is multiple TLB(either data or instruction) hit > exception. > - If it means something, both SR and SSR are 0x700000F1 > > I can't interpret those yet. If anyone have thought about something > related, I deadly would like to know. Some more reports. - calling sequence until reboot is copy_user_page(arch/sh/mm/cache.c) +-update_mmu_cache(arch/sh/mm/fault.c) +-copy_page : #defined as memcpy(arch/sh/lib/memcpy.S) Last two functions are called at line 588 and 589 of cache.c, respectively. - most of the time, exception occurs at line 86 of memcpy.S, which is mov.l r1,@-r0 at this time, @r0=0xc0002ffc - TEA has 0xc0002ffc at the time of exception/reboot. So, this is multiple data TLB hit exception. - 0xc0002ffc is within the area that p3_cache_init allocated with remap_area_pages. This area is from 0xc0000000 to 0xc0003fff. I still can't understand what is causing this exception, and why this doesn't happen on SolutionEngine or other SH4 system. I appreciate any input on this. +-------------------------------------+ | Masahiro Abe, Software Engineer | | A&D Co., Ltd. of Tokyo, Japan | | mailto:m-...@aa... | +-------------------------------------+ |This is my opinion, not my employer's| +-------------------------------------+ |
From: SUGIOKA T. <su...@it...> - 2001-07-13 04:33:00
|
At 11:18 01/07/13 +0900, Masahiro Abe <m-...@aa...> wrote: >Some more reports. >- calling sequence until reboot is > copy_user_page(arch/sh/mm/cache.c) > +-update_mmu_cache(arch/sh/mm/fault.c) > +-copy_page : #defined as memcpy(arch/sh/lib/memcpy.S) > Last two functions are called at line 588 and 589 of cache.c, > respectively. >- most of the time, exception occurs at line 86 of memcpy.S, which is > mov.l r1,@-r0 > at this time, @r0=0xc0002ffc >- TEA has 0xc0002ffc at the time of exception/reboot. So, this is > multiple data TLB hit exception. >- 0xc0002ffc is within the area that p3_cache_init allocated with > remap_area_pages. This area is from 0xc0000000 to 0xc0003fff. Does this patch change your situation ? Index: arch/sh/mm/cache.c =================================================================== RCS file: /cvsroot/linuxsh/kernel/arch/sh/mm/cache.c,v retrieving revision 1.41 diff -u -r1.41 cache.c --- arch/sh/mm/cache.c 2001/07/12 06:27:37 1.41 +++ arch/sh/mm/cache.c 2001/07/13 04:21:45 @@ -18,6 +18,7 @@ #include <asm/cache.h> #include <asm/io.h> #include <asm/uaccess.h> +#include <asm/mmu_context.h> #if defined(__sh3__) #define CCR 0xffffffec /* Address of Cache Control Register */ @@ -519,6 +520,8 @@ /* Page is 4K, OC size is 16K, there are four lines. */ #define CACHE_ALIAS 0x00003000 +extern void __flush_tlb_page(unsigned long asid, unsigned long page); + /* * clear_user_page * @to: P1 address @@ -548,6 +551,7 @@ save_and_cli(flags); entry = mk_pte_phys(phys_addr, pgprot); set_pte(pte, entry); + __flush_tlb_page(get_asid(), p3_addr&PAGE_MASK); update_mmu_cache(NULL, p3_addr, entry); clear_page((void *)p3_addr); restore_flags(flags); @@ -585,6 +589,7 @@ save_and_cli(flags); entry = mk_pte_phys(phys_addr, pgprot); set_pte(pte, entry); + __flush_tlb_page(get_asid(), p3_addr&PAGE_MASK); update_mmu_cache(NULL, p3_addr, entry); copy_page((void *)p3_addr, from); restore_flags(flags); Index: arch/sh/mm/fault.c =================================================================== RCS file: /cvsroot/linuxsh/kernel/arch/sh/mm/fault.c,v retrieving revision 1.40 diff -u -r1.40 fault.c --- arch/sh/mm/fault.c 2001/07/06 13:11:32 1.40 +++ arch/sh/mm/fault.c 2001/07/13 04:21:46 @@ -28,7 +28,7 @@ #include <asm/mmu_context.h> extern void die(const char *,struct pt_regs *,long); -static void __flush_tlb_page(unsigned long asid, unsigned long page); +void __flush_tlb_page(unsigned long asid, unsigned long page); /* * Ugly, ugly, but the goto's result in better assembly.. @@ -322,7 +322,7 @@ restore_flags(flags); } -static void __flush_tlb_page(unsigned long asid, unsigned long page) +void __flush_tlb_page(unsigned long asid, unsigned long page) { unsigned long addr, data; ---- SUGIOKA Toshinobu |
From: Masahiro A. <m-...@aa...> - 2001-07-13 05:03:55
|
Thank you Sugioka-san. On Fri, 13 Jul 2001 13:30:16 +0900 SUGIOKA Toshinobu <su...@it...> wrote: > Does this patch change your situation ? Yes, now the system passes INIT process and I could log in. I can say this patch fixes the problem. Sugioka-san, Thank you so much! I would like to ask you about this patch. Is this real fix, or (kind of) hiding the problem caused by another source? Sorry if this sounds silly, I still don't understand the fundamentals of MMU and cache. BTW, power-up-to-login-prompt time became much shorter than 2.4.5 kernel. I feel like it's become less than one-third (1/3). This is the reason why I wanted to upgrade to the CVS kernel. +-------------------------------------+ | Masahiro Abe, Software Engineer | | A&D Co., Ltd. of Tokyo, Japan | | mailto:m-...@aa... | +-------------------------------------+ |This is my opinion, not my employer's| +-------------------------------------+ |
From: SUGIOKA T. <su...@it...> - 2001-07-13 05:51:31
|
At 14:03 01/07/13 +0900, Masahiro Abe <m-...@aa...> wrote: > >> Does this patch change your situation ? > >Yes, now the system passes INIT process and I could log in. I can say >this patch fixes the problem. Sugioka-san, Thank you so much! > Good!! >I would like to ask you about this patch. Is this real fix, or (kind of) >hiding the problem caused by another source? Sorry if this sounds silly, >I still don't understand the fundamentals of MMU and cache. > 'copy_user_page/clear_user_page' creates temporal TLB entry to avoid cache alias issue on some situation. and uses temporal virtual address (in P3) instead of requested address (in U0). This entry should be flashed before adding next temporal TLB entry on this virtual address, otherwise multiple TLB hit will occur. that patch flashes before adding new temporal TLB entry, but it might better to flash immediately after copy/clear memory. ---- SUGIOKA Toshinobu |
From: Masahiro A. <m-...@aa...> - 2001-07-13 06:02:53
|
On Fri, 13 Jul 2001 14:48:47 +0900 SUGIOKA Toshinobu <su...@it...> wrote: > At 14:03 01/07/13 +0900, Masahiro Abe <m-...@aa...> wrote: > > >I would like to ask you about this patch. Is this real fix, or (kind of) > >hiding the problem caused by another source? Sorry if this sounds silly, > >I still don't understand the fundamentals of MMU and cache. > > 'copy_user_page/clear_user_page' creates temporal TLB entry to avoid cache alias issue on some situation. > and uses temporal virtual address (in P3) instead of requested address (in U0). > > This entry should be flashed before adding next temporal TLB entry on this virtual address, > otherwise multiple TLB hit will occur. Thank you for your detailed explanation. It will help me a lot to understand the situation. > that patch flashes before adding new temporal TLB entry, but it might better to flash immediately after > copy/clear memory. So this means that instead of calling __flush_tlb_page BEFORE update_mmu_cache, we should call it AFTER copy_page? I'm gonna try that. ================================= Masahiro ABE, A&D Co., Ltd. Japan |
From: Masahiro A. <m-...@aa...> - 2001-07-13 08:31:06
|
On Fri, 13 Jul 2001 15:02:34 +0900 Masahiro Abe <m-...@aa...> wrote: > On Fri, 13 Jul 2001 14:48:47 +0900 > SUGIOKA Toshinobu <su...@it...> wrote: > > > that patch flashes before adding new temporal TLB entry, but it might better to flash immediately after > > copy/clear memory. > > So this means that instead of calling __flush_tlb_page BEFORE > update_mmu_cache, we should call it AFTER copy_page? I'm gonna try that. Simply moving __flush_tlb_page call after copy_page call didn't work. Maybe I'm doing something stupid. I can go with original Sugioka-san's patch for now. ================================= Masahiro ABE, A&D Co., Ltd. Japan |
From: Masahiro A. <m-...@aa...> - 2001-07-17 01:55:56
|
Sugioka-san, Niibe-san and all, Current arch/sh/mm/cache.c in CVS gives me Oops error on our custom board, while INIT. (This doesn't happen on SolutionEngine 7750S.) Errors are "Unable to handle kernel paging request at virtual address c0000000", with address varies to c0001000/c0002000/c0003000. Errors occur at PC=8800c488, which is within __flush_dcache_region. copy_user_page and clear_user_page were modified to call pte_clear after copy_page/clear_page. If I comment this call out, then Oops errors disappear. (In fact, after comment out, the system is so stable so far that RTLinux is running reliably.) I still don't understand the reason of call to it, but I would like to know if it is necessary to call it or not. If it is, then what do you think I should do to kill this Oops error? ================================= Masahiro ABE, A&D Co., Ltd. Japan |
[linuxsh-dev] Re: "pte_clear" needed in copy_user_page? (was Re: System
reset while INIT processing)
From: SUGIOKA T. <su...@it...> - 2001-07-17 03:16:48
|
At 10:55 01/07/17 +0900, Masahiro Abe <m-...@aa...> wrote: >Current arch/sh/mm/cache.c in CVS gives me Oops error on our custom >board, while INIT. (This doesn't happen on SolutionEngine 7750S.) > >Errors are "Unable to handle kernel paging request at virtual address >c0000000", with address varies to c0001000/c0002000/c0003000. Errors >occur at PC=8800c488, which is within __flush_dcache_region. > >copy_user_page and clear_user_page were modified to call pte_clear after >copy_page/clear_page. If I comment this call out, then Oops errors >disappear. (In fact, after comment out, the system is so stable so far >that RTLinux is running reliably.) > >I still don't understand the reason of call to it, but I would like to >know if it is necessary to call it or not. If it is, then what do you >think I should do to kill this Oops error? I hope that following patch will solve the problem. I'm not so sure that this is absolutely correct, but it will be safe anyway. Index: arch/sh/mm/cache.c =================================================================== RCS file: /cvsroot/linuxsh/kernel/arch/sh/mm/cache.c,v retrieving revision 1.43 diff -u -r1.43 cache.c --- arch/sh/mm/cache.c 2001/07/16 05:08:19 1.43 +++ arch/sh/mm/cache.c 2001/07/17 03:06:10 @@ -553,9 +553,9 @@ __flush_tlb_page(get_asid(), p3_addr); update_mmu_cache(NULL, p3_addr, entry); clear_page((void *)p3_addr); + __flush_dcache_region(p3_addr, p3_addr+PAGE_SIZE); pte_clear(pte); restore_flags(flags); - __flush_dcache_region(p3_addr, p3_addr+PAGE_SIZE); } } @@ -592,9 +592,9 @@ __flush_tlb_page(get_asid(), p3_addr); update_mmu_cache(NULL, p3_addr, entry); copy_page((void *)p3_addr, from); + __flush_dcache_region(p3_addr, p3_addr+PAGE_SIZE); pte_clear(pte); restore_flags(flags); - __flush_dcache_region(p3_addr, p3_addr+PAGE_SIZE); } } #endif ---- SUGIOKA Toshinobu |
[linuxsh-dev] Re: "pte_clear" needed in copy_user_page? (was Re: System reset while INIT processing)
From: Masahiro A. <m-...@aa...> - 2001-07-17 03:55:10
|
On Tue, 17 Jul 2001 12:13:24 +0900 SUGIOKA Toshinobu <su...@it...> wrote: > I hope that following patch will solve the problem. > I'm not so sure that this is absolutely correct, but it will be safe anyway. Thank you so much Sugioka-san, it works beautifully. ================================= Masahiro ABE, A&D Co., Ltd. Japan |
From: Masahiro A. <m-...@aa...> - 2001-07-12 07:33:13
|
Self follow-up. On Thu, 12 Jul 2001 09:32:47 +0900 Masahiro Abe <m-...@aa...> wrote: > Additional info (may be related to this matter): > Our board has SMC91C96 Ethernet controller. It worked fine with 2.4.5 > kernel. With current kernel, the driver hangs in the middle of > initialization (in smc_findirq, at outb just after SMC_DELAY). So, I'm > temporarily not using it now. I think I've found the reason of this hang. In fact it is not "hang", but "repetitive interrupt". This is not related to the other "reboot" problem, I believe. I'm going to explain the situation I've found out, and would like to get your opinion which way to go. smc9194 driver does auto-probe IRQ of the chip, and uses probe_irq_on/off of irq.c. probe_irq_on calls startup of desc->handler to unmask each irq with not action installed. So far, so good. This part is not changed from 2.4.5 to current CVS. do_IRQ in irq.c is changed between those two release. - At the entry, desc->handler->ack is called so that irq is masked. This is not changed. - When no action is registered for the irq, * until 2.4.5, it simply returns without unmasking irq * from 2.4.6, it calls desc->handler->end to unmask and exit Level interrupt is used from 9194 for our board. So, until the ack is sent to the chip, interrupt is sent to the CPU as soon as it is unmasked. With 2.4.5 kernel, this didn't happen because it is not unmasked. With 2.4.6 kernel, interrupt handler is called repetitively before the driver sends ack to the chip. What is the proper way to solve this problem? Has anybody seen this? What did you do then? Or, my assumption is not correct? I appreciate any input. I can simply walk-around this by passing kernel parameter not to auto-probe, but it's not the right way to this problem. +-------------------------------------+ | Masahiro Abe, Software Engineer | | A&D Co., Ltd. of Tokyo, Japan | | mailto:m-...@aa... | +-------------------------------------+ |This is my opinion, not my employer's| +-------------------------------------+ |