| 
      
      
      From: Ken T. <ke...@we...> - 2001-06-09 04:41:49
       | 
| Something has changed in the latest kernel that has fixxed 99 % of the X chrashes I was having. It still stops very occasionally but now exits back to the initiating console with a message rather than locking my system up. Michel, the Section Module Load extmod fixed the icon problem, thanks. Much happier now, was losing faith ;) Ken. | 
| 
      
      
      From: Glenn H. <gh...@c2...> - 2001-06-09 11:19:12
       | 
| Hello, On 09-Jun-2001, Ken Tyler wrote: > Something has changed in the latest kernel that has fixxed 99 % of the X > chrashes I was having. It still stops very occasionally but now exits > back to the initiating console with a message rather than locking my > system up. Interesting... Guess what I'll be trying out tonight ;-) - glenn | 
| 
      
      
      From: Ken T. <ke...@we...> - 2001-06-10 01:43:41
       | 
| On Sat, 9 Jun 2001, Glenn Hisdal wrote: > Interesting... > Guess what I'll be trying out tonight ;-) I don't wish to know... But X is much better, still had a couple of crashes with a message about signal 4. After X crashes and goes back to a console, the next command I attempt too run fails and says 'illegal instruction' and core dumps, after that all is OK. Not sure if its the illegal instruction from the X crash being reported again or something else. Ken. | 
| 
      
      
      From: Glenn H. <gh...@c2...> - 2001-06-10 11:05:21
       | 
| Hello, Ken Tyler wrote: > On Sat, 9 Jun 2001, Glenn Hisdal wrote: >> Interesting... >> Guess what I'll be trying out tonight ;-) > I don't wish to know... :-)) > But X is much better, still had a couple of crashes with a message about > signal 4. > After X crashes and goes back to a console, the next command I attempt too > run fails and says 'illegal instruction' and core dumps, after that all is > OK. Not sure if its the illegal instruction from the X crash being > reported again or something else. hmmm... X doesn't appear to behave any different here. The system still locks up completely when switching back to console. - glenn | 
| 
      
      
      From: Ken T. <ke...@we...> - 2001-06-10 19:18:56
       | 
| On Sun, 10 Jun 2001, Glenn Hisdal wrote: > > But X is much better, still had a couple of crashes with a message about > > signal 4. > hmmm... > X doesn't appear to behave any different here. > The system still locks up completely when switching back to console. Hello, Not good news, I wonder if you have the exact same version from the cvs repositry as I do, Roman has posted more changes since I updated my sources - hope it hasn't undone the fix. I'll tar up my source before updating again. Ken. | 
| 
      
      
      From: Roman Z. <zi...@li...> - 2001-06-10 22:31:56
       | 
| Hi, Ken Tyler wrote: > Not good news, I wonder if you have the exact same version from the cvs > repositry as I do, Roman has posted more changes since I updated my > sources - hope it hasn't undone the fix. I'll tar up my source before > updating again. I've imported only the 2.4.5 sources. You can checkout the old tree with '-r apus-2_4_5-pre3' (and back to the current source with -A). Could you run some tests (like compiling a kernel with 'make -j 3'), to see if the kernel runs stable without X? You can also try one of the snapshot kernels I've put on the ftp site. bye, Roman | 
| 
      
      
      From: Glenn H. <gh...@c2...> - 2001-06-11 10:24:19
       | 
| Hello, Roman Zippel wrote: > Hi, > Ken Tyler wrote: >> Not good news, I wonder if you have the exact same version from the cvs >> repositry as I do, Roman has posted more changes since I updated my >> sources - hope it hasn't undone the fix. I'll tar up my source before >> updating again. > I've imported only the 2.4.5 sources. You can checkout the old tree with > '-r apus-2_4_5-pre3' (and back to the current source with -A). Could you > run some tests (like compiling a kernel with 'make -j 3'), to see if the > kernel runs stable without X? You can also try one of the snapshot > kernels I've put on the ftp site. Yes. I will do that. I have a 2.4.5-pre3 kernel, so that should be the same as Ken has, since you imported 2.4.5, right ? My last exam is on tuesday. After that I should have some time to do some testing... Any ideas on what/how to test to find the exact problem ? - glenn | 
| 
      
      
      From: Ken T. <ke...@we...> - 2001-06-12 02:12:26
       | 
| On Mon, 11 Jun 2001, Roman Zippel wrote: Hello, > I've imported only the 2.4.5 sources. You can checkout the old tree with > '-r apus-2_4_5-pre3' (and back to the current source with -A). Could you > run some tests (like compiling a kernel with 'make -j 3'), to see if the > kernel runs stable without X? You can also try one of the snapshot > kernels I've put on the ftp site. With 2.4.5-pre3 make -j 3 compiles, and what it makes boots and runs OK. With that kernel I can make -j 3 again without problem. If up the jobs to 6 I start to get a few problems : make[2]: *** [first_rule] Illegal instruction (core dumped) make[1]: *** [first_rule] Illegal instruction (core dumped) Before -j 6 I tried -j 10, this didn't do at all well, reporting internal compiler (gcc version 2.95.2 19991024) error a number of times. X runs as before, drops back to console now and then. I have your 20010610 kernel and modules but not tried them yet. Ken. | 
| 
      
      
      From: Ken T. <ke...@we...> - 2001-06-14 06:58:33
       | 
| On Mon, 11 Jun 2001, Roman Zippel wrote: Hello, > I've imported only the 2.4.5 sources. You can checkout the old tree with > '-r apus-2_4_5-pre3' (and back to the current source with -A). Could you > run some tests (like compiling a kernel with 'make -j 3'), to see if the > kernel runs stable without X? You can also try one of the snapshot > kernels I've put on the ftp site. I've tried all kernels now, cvs 2.4.5-pre3, cvs 2.4.5 and precompiled 2.4.5. They all behave identically, make -j 3 is OK but make -j 6 causes 'illegal instruction - core dumped' messages. 2.4.5 is 'better' than 2.2.x, X is more stable, when it crashes it goes back to a console, but I don't ever recall make -j problems with a 2.2 kernel - but I'll try just in case. The 2.2 X crashes were interrupt related but traceback never pointed to anything specific. Ken. | 
| 
      
      
      From: Roman Z. <zi...@li...> - 2001-06-14 10:57:33
       | 
| Hi, (Sorry for not answering earlier, I was busy with some other stuff.) On Thu, 14 Jun 2001, Ken Tyler wrote: > I've tried all kernels now, cvs 2.4.5-pre3, cvs 2.4.5 and precompiled > 2.4.5. They all behave identically, make -j 3 is OK but make -j 6 > causes 'illegal instruction - core dumped' messages. > > 2.4.5 is 'better' than 2.2.x, X is more stable, when it crashes it goes > back to a console, but I don't ever recall make -j problems with a 2.2 > kernel - but I'll try just in case. > > The 2.2 X crashes were interrupt related but traceback never pointed to > anything specific. In your case I still didn't rule out bad memory. What is your memory configuration, how much swap do you have? It would help a lot if we would know that the base system is stable, before we go on to X. bye, Roman | 
| 
      
      
      From: Glenn H. <gh...@c2...> - 2001-06-14 15:14:51
       | 
| Hello, On 14-Jun-2001, Roman Zippel wrote: > Hi, > (Sorry for not answering earlier, I was busy with some other stuff.) > On Thu, 14 Jun 2001, Ken Tyler wrote: >> I've tried all kernels now, cvs 2.4.5-pre3, cvs 2.4.5 and precompiled >> 2.4.5. They all behave identically, make -j 3 is OK but make -j 6 >> causes 'illegal instruction - core dumped' messages. >> >> 2.4.5 is 'better' than 2.2.x, X is more stable, when it crashes it goes >> back to a console, but I don't ever recall make -j problems with a 2.2 >> kernel - but I'll try just in case. >> >> The 2.2 X crashes were interrupt related but traceback never pointed to >> anything specific. > In your case I still didn't rule out bad memory. What is your memory > configuration, how much swap do you have? > It would help a lot if we would know that the base system is stable, > before we go on to X. I have succesfully compiled and used kernels using make -j 3, -j 6 and -j 12 so it looks like the base system is stable here. This is with a 'homemade' 2.4.5 kernel from the latest CVS sources. - glenn | 
| 
      
      
      From: Ken T. <ke...@we...> - 2001-06-15 23:16:37
       | 
| On Thu, 14 Jun 2001, Glenn Hisdal wrote: > I have succesfully compiled and used kernels using make -j 3, -j 6 and -j 12 > so it looks like the base system is stable here. > This is with a 'homemade' 2.4.5 kernel from the latest CVS sources. I thought Roman was meaning base hardware, hence my long post. What is your hardware ? Maybe my problem is unique to 604e processor on A4000 cyberstorm card. Just a thought, I pass nobats on booting, a hangover from 4091 work, I'll try without it. Ken. | 
| 
      
      
      From: Glenn H. <gh...@c2...> - 2001-06-15 23:45:27
       | 
| Hello Ken On 16-Jun-2001, you wrote: > On Thu, 14 Jun 2001, Glenn Hisdal wrote: >> I have succesfully compiled and used kernels using make -j 3, -j 6 and -j >> 12 so it looks like the base system is stable here. This is with a >> 'homemade' 2.4.5 kernel from the latest CVS sources. > I thought Roman was meaning base hardware, hence my long post. > What is your hardware ? A4000 CyberstormPPC with 128MB RAM CVisionPPC > Maybe my problem is unique to 604e processor on A4000 cyberstorm card. No, that can't be it. Then I should have had the same problem here... > Just a thought, I pass nobats on booting, a hangover from 4091 work, I'll > try without it. ok. I don't use that option. Do you use the 60nsram option ? Tried without it ? - glenn | 
| 
      
      
      From: Ken T. <ke...@we...> - 2001-06-16 08:44:09
       | 
| On Sat, 16 Jun 2001, Glenn Hisdal wrote: > > What is your hardware ? > > A4000 > CyberstormPPC with 128MB RAM > CVisionPPC OK > > Maybe my problem is unique to 604e processor on A4000 cyberstorm card. Thought you might have has a 1200 which have a different cpu. > No, that can't be it. Then I should have had the same problem here... Send me your .config please and I'll try to compile that, can't see that being the problem though. > > Just a thought, I pass nobats on booting, a hangover from 4091 work, I'll > > try without it. > ok. I don't use that option. No better without nobats, system seems a bit faster. > Do you use the 60nsram option ? Tried without it ? No, I've tried it and I can't run with that option. Ken. | 
| 
      
      
      From: Glenn H. <gh...@c2...> - 2001-06-16 10:11:38
       | 
| Hello, On 16-Jun-2001, Ken Tyler wrote: > On Sat, 16 Jun 2001, Glenn Hisdal wrote: >>> What is your hardware ? >> >> A4000 >> CyberstormPPC with 128MB RAM >> CVisionPPC > OK > >>> Maybe my problem is unique to 604e processor on A4000 cyberstorm card. > Thought you might have has a 1200 which have a different cpu. >> No, that can't be it. Then I should have had the same problem here... > Send me your .config please and I'll try to compile that, can't see that > being the problem though. > (sent in private mail) >> Do you use the 60nsram option ? Tried without it ? > No, I've tried it and I can't run with that option. ok. same here... - glenn | 
| 
      
      
      From: Ken T. <ke...@we...> - 2001-06-15 23:02:54
       | 
| On Thu, 14 Jun 2001, Roman Zippel wrote: > In your case I still didn't rule out bad memory. What is your memory > configuration, how much swap do you have? > It would help a lot if we would know that the base system is stable, > before we go on to X. Hello, Not sure that I have a memory problem given the different behaviour under 2.2.10 and 2.4.5, I'll do any tests you suggest. My system is A4000, 16 + 2M, A2065, GVPIOExtender, A4091, CV64-3d, 060/604 + 64M. swapon -s reports /dev/hda7 partition 204776 0 0 /dev/fastram partition 16380 0 1 I've tried without fastram swap. Makes no difference if I use the original virge or my modified virge driver. With and without A4091 SCSI. I've posted all the following details before but just to collect them in one place : 2.2.? is stable, never a problem fron consoles, just compiled 2.4.5 with make -j 10 without a problem, which probably means memory and sawpping are OK ... yes ? Helix gnome has been unstable since I installed it, crashes always locked up my system and required a reboot. I remember KDE being more stable but still played up occasionally. As I've said previously, X crashes often reported "Page fault in interrupt handler" in dmesgs, the call back trace was always interrupt related but not consistent. MagicSysReq was dead too. Sometimes it took several attempts, lockups and reboots to get X to start. When it did start, just running up and down menus a bit quickly could crash it, if I 'nursed' it gently I could use it for a while but it would crash eventually. Have tried many Xservers, all the same results. Running 2.4.5, anything more than make -j 3 on a kernel causes the compiler to report internal errors, illegal instructions and produce core dumps, sometimes logging me out but not locking up (now I know what to lookfor, maybe even -j 3 has problems). But under 2.4.5, X is much more stable, probably as good or nearly as good as another well known operating system. When it does crash, I've had about 6 so far, it returns to the initiating console and reports signal 4 and I think I saw signal 11 on one occasion. One thing that happens is that after X exits with a signal, the next command run, like ls, fails and reports illegal instruction but after that all is OK, I can restart X, do anything. On the 4091: A4091 never worked reliably, still doesn't. I spent ages on it with a very helpful and patient Richard Hirst. Even mailed Dave Haynie, he said there was an outstanding problem with ZORRO III, a bus arbitration transition at the wrong time could lock up the bus state machine, but it only affected those devices that did some sort of extended burst cycles, the 4091 being the only one. I hacked sim710 to drive the 4091, it behaved exactly like the 'big' 53c7xx driver. Presence or absence of 4091 driver doesn't affect the way kernels behave. ext2 amd swap partitions are on IDE. Ken. | 
| 
      
      
      From: Geert U. <ge...@li...> - 2001-06-16 08:53:40
       | 
| On Sat, 16 Jun 2001, Ken Tyler wrote:
> But under 2.4.5, X is much more stable, probably as good or nearly as good
> as another well known operating system. When it does crash, I've had about
> 6 so far, it returns to the initiating console and reports signal 4 and I
> think I saw signal 11 on one occasion. One thing that happens is that
> after X exits with a signal, the next command run, like ls, fails and
> reports illegal instruction but after that all is OK, I can restart X, do
> anything. 
This either means bad RAM or so, or a bug in the kernel MM.
Gr{oetje,eeting}s,
						Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@li...
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds
 | 
| 
      
      
      From: Ken T. <ke...@we...> - 2001-06-16 23:08:26
       | 
| On Sat, 16 Jun 2001, Geert Uytterhoeven wrote: > This either means bad RAM or so, or a bug in the kernel MM. OK, but before I start looking for 32 Meg SIMMS, I tried Glenn's config removing PERMEDIA and PCI, putting in VIRGE, running the 2.4.5 kernel compiled with Reading specs from /usr/lib/gcc-lib/ppc-redhat-linux/2.95.2/specs gcc version 2.95.2 19991024 (release/franzo) under 2.2.10 has the same problems. make -j 10 : gcc: Internal compiler error: program cpp got fatal signal 4 make[1]: *** [ppc_defs.h] Error 1 make[1]: Leaving directory `/usr/src/linux_CVS/2.4_new/arch/ppc/kernel' make: *** [_dir_arch/ppc/kernel] Error 2 make[1]: *** [first_rule] Illegal instruction (core dumped) make[1]: Leaving directory `/usr/src/linux_CVS/2.4_new/kernel' make: *** [_dir_kernel] Error 2 sched.c: In function `schedule': sched.c:711: Internal compiler error: sched.c:711: output pipe has been closed cpp: output pipe has been closed gcc: Internal compiler error: program as got fatal signal 4 and etc (is this compiler OK ?) I'll now try just with AMIFB, no VIRGE. Couldn't be caused by not having any PCI compiled in could it ? Next is pulling out the mem and giving its pins a clean. A bit odd that make -j 10 is OK on 2.2.10 ? Ken. | 
| 
      
      
      From: Glenn H. <gh...@c2...> - 2001-06-16 23:59:42
       | 
| Hello, On 17-Jun-01, Ken Tyler wrote: > On Sat, 16 Jun 2001, Geert Uytterhoeven wrote: >> This either means bad RAM or so, or a bug in the kernel MM. > OK, but before I start looking for 32 Meg SIMMS, I tried Glenn's config > removing PERMEDIA and PCI, putting in VIRGE, running the 2.4.5 kernel > compiled with > Reading specs from /usr/lib/gcc-lib/ppc-redhat-linux/2.95.2/specs > gcc version 2.95.2 19991024 (release/franzo) [...] > (is this compiler OK ?) I use the exact same version... - glenn | 
| 
      
      
      From: Michel  <mic...@ii...> - 2001-06-17 00:03:50
       | 
| Ken Tyler wrote: > > On Sat, 16 Jun 2001, Geert Uytterhoeven wrote: > > > This either means bad RAM or so, or a bug in the kernel MM. > > OK, but before I start looking for 32 Meg SIMMS, I tried Glenn's config > removing PERMEDIA and PCI, putting in VIRGE, running the 2.4.5 kernel > compiled with > > Reading specs from /usr/lib/gcc-lib/ppc-redhat-linux/2.95.2/specs > gcc version 2.95.2 19991024 (release/franzo) > > under 2.2.10 has the same problems. > > make -j 10 : > > gcc: Internal compiler error: program cpp got fatal signal 4 > make[1]: *** [ppc_defs.h] Error 1 > make[1]: Leaving directory `/usr/src/linux_CVS/2.4_new/arch/ppc/kernel' > make: *** [_dir_arch/ppc/kernel] Error 2 > make[1]: *** [first_rule] Illegal instruction (core dumped) > make[1]: Leaving directory `/usr/src/linux_CVS/2.4_new/kernel' > make: *** [_dir_kernel] Error 2 > sched.c: In function `schedule': > sched.c:711: Internal compiler error: > sched.c:711: output pipe has been closed > cpp: output pipe has been closed > gcc: Internal compiler error: program as got fatal signal 4 > and etc > > (is this compiler OK ?) > > I'll now try just with AMIFB, no VIRGE. > > Couldn't be caused by not having any PCI compiled in could it ? > > Next is pulling out the mem and giving its pins a clean. > > A bit odd that make -j 10 is OK on 2.2.10 ? 2.4 demands more swap (at least twice the RAM size) than 2.2 . Have you watched it during the build? -- Earthling Michel Dänzer (MrCooper) \ Debian GNU/Linux (powerpc) developer CS student, Free Software enthusiast \ XFree86 and DRI project member | 
| 
      
      
      From: Ken T. <ke...@we...> - 2001-06-17 06:47:39
       | 
| On Sun, 17 Jun 2001, Michel D=E4nzer wrote: > 2.4 demands more swap (at least twice the RAM size) than 2.2 . Have you > watched it during the build? I have now, most of the 16 meg of fastram swap is used, but only about 35k blocks of 200k blocks of hd swap is used - before signal 4, internal compiler error appears. Ken. =20 | 
| 
      
      
      From: Roman Z. <zi...@li...> - 2001-06-17 02:04:14
       | 
| Hi, Ken Tyler wrote: > 2.2.? is stable, never a problem fron consoles, just compiled 2.4.5 with > make -j 10 without a problem, which probably means memory and sawpping are > OK ... yes ? 'make -j 10' or 'make -j 3' shouldn't make a difference for the memory. There should be actually only light disk activity to keep the memory busy. The more jobs you're starting the more you testing the disk i/o. Anyway, last test to see whether it's the memory. Could you run this loop for a while under 2.2: 'while make -j 2; do make clean; done'. Keep it running for some time. If the memory is ok and 2.2 is stable, it should keep on going. Otherwise we need to localize what's going wrong, the best would be to start with an absolute minimum system, that means a 2.4.5 kernel with just the basic stuff compiled in and preferably also remove any unneeded hardware and run the compile test. If that doesn't work, send me the kernel + config + exact boot options and I'll try it on my machine. If it works, add some hardware, activate the driver and rerun the tests. It's really important to localize the problem as much as possible, if it's somewhere in the base hardware, I can try to run the same test here. If it's a driver problem, I can check the driver, but without testing it myself I can only do some educated guess where the exact problem is. bye, Roman | 
| 
      
      
      From: Ken T. <ke...@we...> - 2001-06-17 07:00:44
       | 
| On Sun, 17 Jun 2001, Roman Zippel wrote: > 'make -j 10' or 'make -j 3' shouldn't make a difference for the memory. Not disputing what you say but I would have thought that the running makes and gccs are still being brought into 'execution' memory, from disk buffers producing more mem activity, and also the more tasks mean more context switches and more opportunity for errors. > Anyway, last test to see whether it's the memory. Could you run this > loop for a while under 2.2: 'while make -j 2; do make clean; done'. Keep > it running for some time. If the memory is ok and 2.2 is stable, it > should keep on going. I'll do that tonight for 10 - 12 hours. > Otherwise we need to localize what's going wrong, the best would be to > start with an absolute minimum system, that means a 2.4.5 kernel with Im currently building what I think is a minimum config, if it boots I'll start pulling hardware after tonights tests. Ken. | 
| 
      
      
      From: Geert U. <ge...@li...> - 2001-06-17 08:53:12
       | 
| On Sun, 17 Jun 2001, Ken Tyler wrote:
> On Sun, 17 Jun 2001, Roman Zippel wrote:
> > 'make -j 10' or 'make -j 3' shouldn't make a difference for the memory.
> 
> Not disputing what you say but I would have thought that the running makes
> and gccs are still being brought into 'execution' memory, from disk
> buffers producing more mem activity, and also the more tasks mean more
> context switches and more opportunity for errors.
Yes, the probability for seeing problems is higher with a higher -j value, so
I'd expect to see more problems with higher -j values.
BUT, this is statistics! It's quite possible a single run at -j 10 will reveal
no problems, while it will at -j 3.
I'm suffering from the same problem w.r.t. writing corrupted data to my DDS-1
under 2.4.x: so far ik _looks_ like it doesn't happen under 2.2.17, but I can't
prove it due to the nature of statistics. It's much easier to prove a problem
is there (you need only one `true' report), then proving the problem is not
there (you need infinite `false' reports). In the mean time I found one problem
under 2.2.19 too, so either it got introduced between 2.2.17 and 2.2.19, or it
does happen under 2.2.17 too.
Gr{oetje,eeting}s,
						Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@li...
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds
 | 
| 
      
      
      From: Ken T. <ke...@we...> - 2001-06-17 09:42:54
       | 
| 
On Sun, 17 Jun 2001, Geert Uytterhoeven wrote:
> I'm suffering from the same problem w.r.t. writing corrupted data to my DDS-1
> under 2.4.x: so far ik _looks_ like it doesn't happen under 2.2.17, but I can't
> prove it due to the nature of statistics. It's much easier to prove a problem
> is there (you need only one `true' report), then proving the problem is not
> there (you need infinite `false' reports). In the mean time I found one problem
> under 2.2.19 too, so either it got introduced between 2.2.17 and 2.2.19, or it
> does happen under 2.2.17 too.
I know what you're saying, Carl Popper and philosophy of science etc,
repeated observations of sunrise every morning is no proof that it will
rise tomorrow morning.
So you're saying that running the suggested make loop under 2.2.10
overnight is not worth doing because it can't *prove* the absence of the
problem ?
             2.2.10 make -j 10        2.4.5 make -j 10 
success           10                       0
fail               0                      10
comes pretty close to convicing me, where would you put your money ?
I just want 2.4.5 to work, I'll do the overnight test - if the sun comes
up tomorrow ;) 
Something else occured to me, is it possible that one of the daemons or
some chron job splats something ?
Ken.
 |