From: David P. <pa...@rc...> - 2000-09-13 21:30:37
|
It looks like the ieee12844 and/or ieee12844pp drivers are broken on SMP kernels. Any ideas? David ------- Forwarded Messages Return-Path: zim...@fo... Delivery-Date: Mon Sep 11 12:52:21 2000 Return-Path: <zim...@fo...> Received: from localhost (localhost.localdomain [127.0.0.1]) by axel.local (8.9.3/8.9.3) with ESMTP id MAA00725 for <paschal@localhost>; Mon, 11 Sep 2000 12:52:21 -0700 Received: from mail.rcsis.com by localhost with POP3 (fetchmail-5.3.1) for paschal@localhost (single-drop); Mon, 11 Sep 2000 12:52:21 -0700 (PDT) Received: from sauron.forwiss.uni-passau.de (sauron.forwiss.uni-passau.de [132.231.20.100]) by mail.rcsis.com (Rockliffe SMTPRA 4.2.2) with ESMTP id <B00...@ma...> for <pa...@rc...>; Mon, 11 Sep 2000 06:31:50 -0700 Received: from forwiss.uni-passau.de (kepler.fmi.uni-passau.de [132.231.31.158]) by sauron.forwiss.uni-passau.de (8.9.3/8.9.3) with ESMTP id PAA15712 for <pa...@rc...>; Mon, 11 Sep 2000 15:34:30 +0200 (MET DST) Message-Id: <200...@sa...> Date: Mon, 11 Sep 2000 15:34:15 +0200 (MEST) From: Alexander Zimmermann <Ale...@fm...> Reply-To: Ale...@fm... Subject: Problems with hpoj-0.6 To: pa...@rc... In-Reply-To: <200009111115.EAA06535@axel.local> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Hi David, after your announcement I tried hpoj-0.6 and ran into major problems. My environment (RedHat 6.2): CPUs: Dual Pentium III 750 MHz Kernel: 2.2.14-5.0smp Printer: OfficeJet G85 I've compiled and installed hpoj succesfull. Also loading the kernel modules works and "hpo devid" yields: MFG:Hewlett-Packard;MDL:OfficeJet G85;CMD:MLC,PCL,PML,SCL;CLASS:PRINTER; DESCRIPTION:Hewlett-Packard OfficeJet G Series;1284.3M:f7f,f7f; 1284.4DL:4d,4e,1;SERN:SGD05E07X7VL; VSTATUS:$HB0$NC0,ff,DN,IDLE,CUT,K0,C0,SM,NR,KP074,CP072;AiO:0; But any call to ieee12844_print or ptal-connect or even multiple calls to hpo get OID_STATUS_MSG_LINE1_PART1 crashed the kernel with a message like this: Scheduling in interrupt Unable to handle kernel NULL pointer dereference at virtual adress 00000000 current->tss.cr3 = 00101000, %cr3 = 00101000 . . (not copied) . Kernel panic: Attempted to kill the idle task! In swapper task - not syncing And I had to press the hardware reset button :-(. With 0.5 this did not appear. I assume there's something wrong with a kernel module. Is there any way to debug these modules? - -- Ale...@fm... / Pick another fortune cookie. http://www.fmi.uni-passau.de/~zimmerma/ for PGP public key finger / zim...@yo... / ------- Message 2 Return-Path: pa...@rc... Delivery-Date: Mon Sep 11 16:18:37 2000 Return-Path: <pa...@rc...> Received: from localhost (localhost.localdomain [127.0.0.1]) by axel.local (8.9.3/8.9.3) with ESMTP id QAA01676 for <paschal@localhost>; Mon, 11 Sep 2000 16:18:36 -0700 Received: from mail.rcsis.com by localhost with POP3 (fetchmail-5.3.1) for paschal@localhost (single-drop); Mon, 11 Sep 2000 16:18:37 -0700 (PDT) Received: from axel.local (226.dsl9218.rcsis.com [63.92.18.226]) by mail.rcsis.com (Rockliffe SMTPRA 4.2.2) with ESMTP id <B00...@ma...> for <pa...@rc...>; Mon, 11 Sep 2000 16:12:25 -0700 Received: from rcsis.com (localhost.localdomain [127.0.0.1]) by axel.local (8.9.3/8.9.3) with ESMTP id QAA01669; Mon, 11 Sep 2000 16:17:51 -0700 Message-Id: <200009112317.QAA01669@axel.local> To: Ale...@fm... Cc: pa...@rc... Subject: Re: Problems with hpoj-0.6 In-reply-to: Your message of "Mon, 11 Sep 2000 15:34:15 +0200." <200...@sa...> From: pa...@rc... (David Paschal) Reply-To: pa...@rc... Date: Mon, 11 Sep 2000 16:17:51 -0700 Hi, Alexander. Unfortunately I don't have access to an SMP system and I'm not very familiar with kernel-mode development, so I don't have any easy answers. There are several things we can try, though. First of all, try the following variations on the insmod commands: insmod ieee12844.o debug=15 insmod ieee12844pp.o debug=1 If necessary, specify the path to the .o files. Try performing the smallest operation you can that makes it crash. Assuming it still crashes when debug messages are turned on (sometimes this changes the timing and makes the problem go away), I would like to see how far it gets when it dies. The debug messages may or may not make it into syslog by that time. You can also try including the "debug=" parameter on one or the other insmod command and not both. In 0.6 I made some very small changes to ieee12844.c and some larger changes to ieee12844pp.c, but nothing that obviously looks like the culprit. Try reverting back to the 0.5 versions of these files, while otherwise using the rest of 0.6. Verify that it works now. If you're willing to help me with this, then maybe I should next try to send you small patches against the 0.5 versions and gradually add back in the changes that went into 0.6. This would help narrow down exactly which change broke it on your system. I apologize for the inconvenience this causes you. Thanks in advance for helping me resolve this problem. David > Hi David, > > after your announcement I tried hpoj-0.6 and ran into major problems. > My environment (RedHat 6.2): > > CPUs: Dual Pentium III 750 MHz > Kernel: 2.2.14-5.0smp > Printer: OfficeJet G85 > > I've compiled and installed hpoj succesfull. Also loading the kernel > modules works and "hpo devid" yields: > > MFG:Hewlett-Packard;MDL:OfficeJet G85;CMD:MLC,PCL,PML,SCL;CLASS:PRINTER; > DESCRIPTION:Hewlett-Packard OfficeJet G Series;1284.3M:f7f,f7f; > 1284.4DL:4d,4e,1;SERN:SGD05E07X7VL; > VSTATUS:$HB0$NC0,ff,DN,IDLE,CUT,K0,C0,SM,NR,KP074,CP072;AiO:0; > > > But any call to ieee12844_print or ptal-connect or even multiple > calls to > hpo get OID_STATUS_MSG_LINE1_PART1 > crashed the kernel with a message like this: > > Scheduling in interrupt > Unable to handle kernel NULL pointer dereference at virtual adress 00000000 > current->tss.cr3 = 00101000, %cr3 = 00101000 > . > . (not copied) > . > Kernel panic: Attempted to kill the idle task! > In swapper task - not syncing > > > And I had to press the hardware reset button :-(. > > With 0.5 this did not appear. > > I assume there's something wrong with a kernel module. Is there any way > to debug these modules? > -- > Ale...@fm... / Pick another fortune cookie. > http://www.fmi.uni-passau.de/~zimmerma/ > for PGP public key finger / > zim...@yo... / ------- Message 3 Return-Path: zim...@fo... Delivery-Date: Wed Sep 13 12:39:57 2000 Return-Path: <zim...@fo...> Received: from localhost (localhost.localdomain [127.0.0.1]) by axel.local (8.9.3/8.9.3) with ESMTP id MAA00719 for <paschal@localhost>; Wed, 13 Sep 2000 12:39:56 -0700 Received: from mail.rcsis.com by localhost with POP3 (fetchmail-5.3.1) for paschal@localhost (single-drop); Wed, 13 Sep 2000 12:39:56 -0700 (PDT) Received: from sauron.forwiss.uni-passau.de (sauron.forwiss.uni-passau.de [132.231.20.100]) by mail.rcsis.com (Rockliffe SMTPRA 4.2.2) with ESMTP id <B00...@ma...> for <pa...@rc...>; Wed, 13 Sep 2000 05:38:30 -0700 Received: from forwiss.uni-passau.de (kepler.fmi.uni-passau.de [132.231.31.158]) by sauron.forwiss.uni-passau.de (8.9.3/8.9.3) with ESMTP id OAA12116 for <pa...@rc...>; Wed, 13 Sep 2000 14:41:11 +0200 (MET DST) Message-Id: <200...@sa...> Date: Wed, 13 Sep 2000 14:40:54 +0200 (MEST) From: Alexander Zimmermann <Ale...@fm...> Reply-To: Ale...@fm... Subject: SMP Problems with hpoj-0.6 To: pa...@rc... In-Reply-To: <200009112317.QAA01669@axel.local> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Hello David, On 11 Sep, David Paschal wrote: > Hi, Alexander. Unfortunately I don't have access to an SMP system and I'm > not very familiar with kernel-mode development, so I don't have any easy > answers. There are several things we can try, though. Fortunately I have access to an SMP system (;-)), but unfortunately I'm also not familiar with kernel module development. But I made some tests. First of all it's not a problem of version 0.6, it's more general, since 0.5 make the same problems. It's really a problem of the SMP kernel: (Seems I've never tested 0.5 with a SMP kernel.) If I boot a non-SMP kernel and load the modules I build on this kernel, it works (I've at least printed one page with ptal-connect). If I boot the SMP kernel and load the SMP kernel modules it crashes at "hpo devid". If I try to load the modules build with the non-SMP kernel into the running SMP kernel it gives unresolved kernel symbols (and also vice-versa), although the module and kernel source where the same! Don't you have any kernel expert at your hand, that may help us? There are some compiler warning when bulding the modules like this: /usr/src/linux-2.2.14/include/linux/smp.h:77: warning: `smp_num_cpus' redefined /usr/src/linux-2.2.14/include/linux/modules-smp/i386_ksyms.ver:72: warning: this is the location of the previous definition But now, how to track this error. Before chrashing there are a lot of messages displayed on the screen (virtual consule 1), but these are not in the log file after rebooting, and scrolling up the screen doesn't work any more, when the kernel crashed. Redirecting to file also does not work. Any idea? > I apologize for the inconvenience this causes you. Thanks in advance for > helping me resolve this problem. You don't have to apologize for anything. It's great to have one person who is developing a device driver, that I can use. To me it's self-evident to give help in this development, as far as I can. And I hope we can solve this problem. - -- Ale...@fm... / Who is John Galt? http://www.fmi.uni-passau.de/~zimmerma/ for PGP public key finger / zim...@yo... / ------- End of Forwarded Messages |
From: Gerhard F. <ger...@mc...> - 2000-09-17 17:30:02
|
David Paschal wrote: > > It looks like the ieee12844 and/or ieee12844pp drivers are broken on SMP > kernels. Any ideas? > > David David, IMHO there are two issues: 1. A driver, that has been compiled for an UP kernel cannot be used with an SMP kernel (and vice versa). An SMP module must be compiled using kernel header files which are configured for SMP (-> CONFIG_SMP) and with the compiler flag -D__SMP__ Is the current Makefile prepared to compile either SMP or UP modules? 2. The current ieee* drivers definitely aren't fully SMP safe. Some time ago I had investigated this and as far as I remember there do *exist* a few critical regions which would require proper SMP locking. I think I had done these investigations with much earlier 2.2 and 2.3 versions than the current ones and not with 2.4 (which didn't yet exist at this time). Although these SMP race conditions are probably unlikely to occur, they potentially do exist. There is one more thing, which gives me a strange feeling in my stomach, and that's the fact, that somebody had reported that start_bh_atomic() and friends do no longer exist in newer 2.3 kernels. Actually I had used them to prevent races between the top and bottom halves - so I've no idea what will happen, if they are just skipped. I think, that the whole locking in the ieee* drivers (UP and SMP) should definitely reinvestigated for SMP and for new kernels (2.3 and 2.4) - probably a few (or many) things might have changed in the new kernels! Gerhard |
From: Gerhard F. <ger...@mc...> - 2000-09-17 18:55:36
|
Gerhard Fuernkranz wrote: > > There is one more thing, which gives me a strange feeling in my stomach, > and that's the fact, that somebody had reported that start_bh_atomic() > and friends do no longer exist in newer 2.3 kernels. Actually I had used > them to prevent races between the top and bottom halves - so I've no > idea what will happen, if they are just skipped. After a very short look at the current SuSE 6.4 (-> 2.2.14) sources it appears, that ret_from_intr in entry.S does *NOT* run the bottom halfes - except it reschedules the current process (in this case schedule() runs them). But as long as the top half is running in the kernel, the process should not be rescheduled. Therefore I think, that an interrupt should not be able trigger a bottom half handler, which can interrupt the top half running in kernel mode (-> in an UP(!!!) kernel). But I've taken only a very short look and I could have overlooked something ... So IMHO there is a chance, that explicit locking between the top and bottom halfes is no longer required (for UP kernels) and is just a relict which was required in some older kernel version (maybe 2.0???). I cannot remember ... Probably they also have removed start_bh_atomic() and friends from recent 2.3 kernels, because it is no longer required? (in UP kernels and SMP kernels need different locking methods anyway). Nevertheless all the locking stuff in the drivers should be investigated very carefully, especially SMP locking, which is currently definitely not supported (i.e. missing) in ieee*.c. E.g. if the start_bh_atomic() go away, then there are no locks at all, which will prevent a top half from running on one processor while a bottom half is running on a different processor simultaneouly. So spinlocks must be added to protect critical regions in order to handle such situations properly. Gerhard |
From: Burkhard K. <bu...@bu...> - 2000-09-17 21:13:40
|
Gerhard Fuernkranz > Gerhard Fuernkranz wrote: > > > > There is one more thing, which gives me a strange feeling in my stomach, > > and that's the fact, that somebody had reported that start_bh_atomic() > > and friends do no longer exist in newer 2.3 kernels. Actually I had used > > them to prevent races between the top and bottom halves - so I've no > > idea what will happen, if they are just skipped. > > After a very short look at the current SuSE 6.4 (-> 2.2.14) sources it > appears, that ret_from_intr in entry.S does *NOT* run the bottom halfes - > except it reschedules the current process (in this case schedule() runs them). > But as long as the top half is running in the kernel, the process should not > be rescheduled. Therefore I think, that an interrupt should not be able > trigger a bottom half handler, which can interrupt the top half running in > kernel mode (-> in an UP(!!!) kernel). But I've taken only a very short look > and I could have overlooked something ... > > So IMHO there is a chance, that explicit locking between the top and bottom > halfes is no longer required (for UP kernels) and is just a relict which was > required in some older kernel version (maybe 2.0???). I cannot remember ... > Probably they also have removed start_bh_atomic() and friends from recent 2.3 > kernels, because it is no longer required? (in UP kernels and SMP kernels need > different locking methods anyway). There is a guide for kernel-locking issues - the kernel-locking-HOWTO. It use to be on http://netfilter.kernelnotes.org/unreliable-guides but seems to be vanished from there. I found it mirrored on pusa.uv.es/~ulisses/netfilter.kernelnotes.org/unreliable-guides > > Nevertheless all the locking stuff in the drivers should be investigated very > carefully, especially SMP locking, which is currently definitely not supported > (i.e. missing) in ieee*.c. > E.g. if the start_bh_atomic() go away, then there are no locks at all, which > will prevent a top half from running on one processor while a bottom half is > running on a different processor simultaneouly. So spinlocks must be added to > protect critical regions in order to handle such situations properly. The issue of BH is described in Chapter 7. Essentially BH are now deprecated due to their limitations with respect to SMP. From a quick glance I learned that BH were reimplemented underneath softirqs. There are now spinlock primitives for "BHs": spin_lock_bh() and spin_unlock_bh(). Other primitives exist: spin_lock_irq, read_lock_irq/bh, write_lock_irq/bh and their counterparts. For non-SMP systems they collapse to a local_xx_disable() call. If I compare the slip driver from 2.2.14 to 2.4.0 I see that calls to start_bh_atomic have been replaced by spin_lock_bh. I have to apologize for not noting this in my former posting where I had only grepped for bh_atomic and found some pieces of code where it had vanished without replacement. Burkhard -- Burkhard Kohl buk at/auf buks.ipn.de |