| 
     
      
      
      From: Geert U. <ge...@li...> - 2001-06-17 09:55:07
       
   | 
On Sun, 17 Jun 2001, Ken Tyler wrote:
> On Sun, 17 Jun 2001, Geert Uytterhoeven wrote:
> > I'm suffering from the same problem w.r.t. writing corrupted data to my DDS-1
> > under 2.4.x: so far ik _looks_ like it doesn't happen under 2.2.17, but I can't
> > prove it due to the nature of statistics. It's much easier to prove a problem
> > is there (you need only one `true' report), then proving the problem is not
> > there (you need infinite `false' reports). In the mean time I found one problem
> > under 2.2.19 too, so either it got introduced between 2.2.17 and 2.2.19, or it
> > does happen under 2.2.17 too.
> 
> I know what you're saying, Carl Popper and philosophy of science etc,
> repeated observations of sunrise every morning is no proof that it will
> rise tomorrow morning.
Actually sometimes the Sun doesn't rise in the morning due to an opaque moon
being in between :-)
> So you're saying that running the suggested make loop under 2.2.10
> overnight is not worth doing because it can't *prove* the absence of the
> problem ?
> 
>              2.2.10 make -j 10        2.4.5 make -j 10 
> 
> success           10                       0
> 
> fail               0                      10
> 
> comes pretty close to convicing me, where would you put your money ?
Sounds sufficiently convincing to me.
> Something else occured to me, is it possible that one of the daemons or
> some chron job splats something ?
Even then it's a kernel bug. Apps should not be able to crash the kernel.
Gr{oetje,eeting}s,
						Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@li...
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds
 | 
| 
     
      
      
      From: Ken T. <ke...@we...> - 2001-06-18 04:55:28
       
  
        
          
            Attachments:
            .config.gz
          
        
       
     | 
Hello, The sun came up this morning and then the power failed (I think it was a coincidence) so I've no idea how the 2.2.10 looping compile test went. It's running again, has been for about 3 or 4 hours, OK so far. Last night I compiled the about the smallest 2.4.5 kernel I could get to boot, compiling itself, it failed the make -j 3 test three out of three times. I've attached the .config for perusal and possible tests. Ken  | 
| 
     
      
      
      From: Ken T. <ke...@we...> - 2001-06-19 19:10:07
       
   | 
Hello, Now pretty sure 2.2.10 does not have the same problem as 2.4.5 After about 18 hours of 2.2.10 doing the suggested make loop, and another 3 hours of -j 10, no problem evident. 2.4.5 is another story. It does seem to depend on what value is given to make -j. At small -j, upto 4 it appears fine after limited tests. -j >5 has trouble, reporting illegal instruction. What might be relevant is that it is predictable. At -j 6 it failed several times at the same stage of compiling - in the same file. Little swap space is used and there is about 2 meg of free mem still available according to /proc/meminfo but that might not be right as I can't be sure I'm seeing free mem at the instant it fails. Stll have to try minimum 2.4.5 with removed hardware. What can I do, what tools can assist ? ken.  | 
| 
     
      
      
      From: Michel  <mic...@ii...> - 2001-06-19 19:36:46
       
   | 
Ken Tyler wrote: > Little swap space is used and there is about 2 meg of free mem still > available according to /proc/meminfo but that might not be right as I > can't be sure I'm seeing free mem at the instant it fails. Have you tried without z2ram swap yet? Just a shot in the dark... -- Earthling Michel Dänzer (MrCooper) \ Debian GNU/Linux (powerpc) developer CS student, Free Software enthusiast \ XFree86 and DRI project member  | 
| 
     
      
      
      From: Ken T. <ke...@we...> - 2001-06-19 21:36:33
       
   | 
On Tue, 19 Jun 2001, Michel D=E4nzer wrote: > > Little swap space is used and there is about 2 meg of free mem still > > available according to /proc/meminfo but that might not be right as I > > can't be sure I'm seeing free mem at the instant it fails. =20 > Have you tried without z2ram swap yet? Just a shot in the dark... My minimal 2.4.5 kernel doesn't have support for ram swap, only using hd for swap and not much of it is used. I saw a different error yesterday, as well as the illegal instruction and seqfault, I got a BPTrap error. I'm also don't think it can be caused by hardware, either mem or cards because of the consitent failing at make -j 6, always at the same place. Ken. =20  | 
| 
     
      
      
      From: Roman Z. <zi...@li...> - 2001-06-19 23:49:32
       
   | 
Hi, Ken Tyler wrote: > I'm also don't think it can be caused by hardware, either mem or cards > because of the consitent failing at make -j 6, always at the same place. Ok, I'm going to play around with that, can you send me the kernel config you used? bye, Roman  | 
| 
     
      
      
      From: Ken T. <ke...@we...> - 2001-06-20 02:22:25
       
  
        
          
            Attachments:
            .config
          
        
       
     | 
On Wed, 20 Jun 2001, Roman Zippel wrote: Hello, > Ok, I'm going to play around with that, can you send me the kernel > config you used? Attached. Thanks, ken.  | 
| 
     
      
      
      From: Roman Z. <zi...@li...> - 2001-06-17 12:42:46
       
   | 
Ken Tyler wrote: > > 'make -j 10' or 'make -j 3' shouldn't make a difference for the memory. > > Not disputing what you say but I would have thought that the running makes > and gccs are still being brought into 'execution' memory, from disk > buffers producing more mem activity, and also the more tasks mean more > context switches and more opportunity for errors. Depends on which errors you want to trigger. The more jobs are running the more it's likely that they just waiting for i/o, so you only have small number of jobs doing some real work (possibly even < 1). bye, Roman  | 
| 
     
      
      
      From: Michel D. <mic...@ii...> - 2001-06-10 22:14:25
       
   | 
Glenn Hisdal wrote: > Ken Tyler wrote: > > > On Sat, 9 Jun 2001, Glenn Hisdal wrote: > > >> Interesting... > >> Guess what I'll be trying out tonight ;-) > > > I don't wish to know... > > :-)) > > > But X is much better, still had a couple of crashes with a message about > > signal 4. > > > After X crashes and goes back to a console, the next command I attempt too > > run fails and says 'illegal instruction' and core dumps, after that all is > > OK. Not sure if its the illegal instruction from the X crash being > > reported again or something else. > > hmmm... > X doesn't appear to behave any different here. > The system still locks up completely when switching back to console. I'm afraid there are at least two different problems. One is the PCI related one which you are suffering from. Ken can hardly have that as he is using a CV64/3D (right?). There seems to be another one that Andreas Wüst reported, it looks like the same as Ken's to me as both reported an error message about floating point in kernel. -- Earthling Michel Dänzer (MrCooper) \ Debian GNU/Linux (powerpc) developer CS student, Free Software enthusiast \ XFree86 and DRI project member  | 
| 
     
      
      
      From: Glenn H. <gh...@c2...> - 2001-06-11 10:24:12
       
   | 
Hello, Michel Daenzer wrote: > Glenn Hisdal wrote: >> Ken Tyler wrote: >> But X is much better, still had a couple of crashes with a message about >> signal 4. >> After X crashes and goes back to a console, the next command I attempt >> too run fails and says 'illegal instruction' and core dumps, after that >> all is OK. Not sure if its the illegal instruction from the X crash >> being reported again or something else. >> hmmm... >> X doesn't appear to behave any different here. >> The system still locks up completely when switching back to console. > I'm afraid there are at least two different problems. One is the PCI > related one which you are suffering from. Ken can hardly have that as he > is using a CV64/3D (right?). There seems to be another one that Andreas > Wüst reported, it looks like the same as Ken's to me as both reported an > error message about floating point in kernel. Oh, right. That could explain why it works better for him then. I just assumed he had a cvppc :-) I will try to get some more testing done the next days. - glenn  |