From: Petr V. <van...@vc...> - 2000-05-29 21:53:21
|
On Mon, May 29, 2000 at 11:34:08PM +0000, Petr Vandrovec wrote: > > Anyone else have this configuration and can verify the crash? > Hmm. It is quite reproducible here on dual PIII SMP G400 too :-( > Probably it is time to start using 'video=scrollback:0' again... > Has anybody local filesystem which does not need fsck after kernel crash?! Hi again, I found offender - but if someone more knowledgable could confirm that... When problem happens, stack trace on first CPU (this is CPU which already was in console subsystem when another arrived) looks like: CPU: 1 EIP: 0010:[<c010c2e5>] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00000096 eax: 0000001d ebx: d0800000 ecx: c023ba34 edx: c023ba28 esi: c024e2b0 edi: d0800010 ebp: 00000010 esp: c15bbe1c ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=c15bb000) Stack: c15bbe20 00000018 c01a214d c021a23f c02b6ee0 c1532070 00000038 c1544000 ffffffff c1544000 00000001 01e701e0 0000007f 00000010 21a00010 00000001 c01a21e1 07070707 00000000 c02b6ee0 c1532078 00000004 0000021a 00000038 Call Trace: [<c01a214d>] [<c021a23f>] [<c01a21e1>] [<c0194519>] [<c0196ac7>] [<c0175053>] [<c0121afb>] [<c0121a1c>] [<c010d914>] [<c0109690>] [<c0109690>] [<c010bf0c>] [<c0109690>] [<c0109690>] [<c0100018>] [<c01096bd>] [<c0109702>] [<c010bf0c>] Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4 >>EIP; c010c2e5 <printstate+9/2c> <===== Trace; c01a214d <matrox_cfbX_putcs+32d/344> Trace; c021a23f <dm_head_vals.930+7877/7cf5> Trace; c01a21e1 <matrox_cfb8_putcs+7d/88> Trace; c0194519 <fbcon_redraw_softback+201/2e4> Trace; c0196ac7 <fbcon_scrolldelta+167/2b8> Trace; c0175053 <console_softint+103/11c> Trace; c0121afb <tasklet_action+4f/7c> Trace; c0121a1c <do_softirq+5c/8c> Trace; c010d914 <do_IRQ+e4/f4> Trace; c0109690 <default_idle+0/34> Trace; c0109690 <default_idle+0/34> Trace; c010bf0c <ret_from_intr+0/20> Trace; c0109690 <default_idle+0/34> Trace; c0109690 <default_idle+0/34> Trace; c0100018 <startup_32+18/c7> Trace; c01096bd <default_idle+2d/34> Trace; c0109702 <cpu_idle+3e/54> Trace; c010bf0c <ret_from_intr+0/20> Code; c010c2e5 <printstate+9/2c> [code stripped, it is only popad; ret] This is stacktrace of CPU which should not came here... CPU: 0 EIP: 0010:[<c010c2e5>] EFLAGS: 00000286 eax: 0000001e ebx: 21a00000 ecx: c023ba34 edx: c023ba28 esi: 00000008 edi: 00000130 ebp: 00000010 esp: cfd49d54 ds: 0018 es: 0018 ss: 0018 Process ls (pid: 16, stackpage=cfd49000) Stack: cfd49d58 00000018 c01a1f38 c021a220 c02b6ee0 c152c04c 00000026 c1544000 00000003 c1544000 00000001 c1544000 0000007f 00000010 21a00010 00000001 c01a21e1 07070707 00000000 c02b6ee0 c152c04c 00000004 0000021a 00000026 Call Trace: [<c01a1f38>] [<c021a220>] [<c01a21e1>] [<c0194519>] [<c0196ac7>] [<c0196c35>] [<c019411b>] [<c0171e3e>] [<c017563a>] [<c017cb36>] [<c017f211>] [<c016c643>] [<c017ef78>] [<c0135cfe>] [<c010be4c>] Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4 >>EIP; c010c2e5 <printstate+9/2c> <===== Trace; c01a1f38 <matrox_cfbX_putcs+118/344> Trace; c021a220 <dm_head_vals.930+7858/7cf5> Trace; c01a21e1 <matrox_cfb8_putcs+7d/88> Trace; c0194519 <fbcon_redraw_softback+201/2e4> Trace; c0196ac7 <fbcon_scrolldelta+167/2b8> Trace; c0196c35 <fbcon_set_origin+1d/24> Trace; c019411b <fbcon_cursor+57/1c8> Trace; c0171e3e <set_cursor+6e/80> Trace; c017563a <con_flush_chars+12/18> Trace; c017cb36 <opost_block+15a/174> Trace; c017f211 <write_chan+299/3a4> Trace; c016c643 <tty_write+24b/340> Trace; c017ef78 <write_chan+0/3a4> Trace; c0135cfe <sys_write+de/100> Trace; c010be4c <system_call+34/38> Code; c010c2e5 <printstate+9/2c> [code stripped; it is only popad; ret] So console system forgot to acquire lock somewhere between opost_block and matrox_cfb8_putcs (I think that between opost_lock and fbcon_set_origin, as scrollback is not reentrant too...). So spin_lock(&console_lock); is missing either in con_flush_chars() or in set_cursor() - and we cannot place it into set_cursor() because of set_cursor() is invoked by vt_console_print(), which contains note: Call me with console_lock held only... There are 4 additional callers of set_cursor() which looks suspicious to me: update_region(), redraw_screen(), unblank_screen() and putconsxy(). I have no idea what is semantic of these functions and I have to go home... But I think that at least putconsxy() should acquire lock too. So this patch is only minimal one. I was not able to cause crash with this patch, but... there are still 4 unchecked entrypoints above... Petr Vandrovec van...@vc... P.S.: Another solution is to disable scrollback. Then set_cursor only shows cursor and this operation can be done specially (reentrant) on vgacon/fbdev level (as software cursor blinks from bottomhalf, these procedures are reentrant already... almost...). P.P.S.: Alan, I do not know whether apply it or whether wait for someone else's approval... diff -urdN linux/drivers/char/console.c linux/drivers/char/console.c --- linux/drivers/char/console.c Sun Apr 2 22:20:27 2000 +++ linux/drivers/char/console.c Mon May 29 21:05:03 2000 @@ -2290,10 +2290,13 @@ static void con_flush_chars(struct tty_struct *tty) { + unsigned long flags; struct vt_struct *vt = (struct vt_struct *)tty->driver_data; pm_access(pm_con); + spin_lock_irqsave(&console_lock, flags); set_cursor(vt->vc_num); + spin_unlock_irqrestore(&console_lock, flags); } /* |