Re: [Kgdb-bugreport] kernel debugging over DMA/Firewire (firescope and fireproxy)
Status: Beta
Brought to you by:
jwessel
From: Bernhard K. <bk...@su...> - 2006-08-24 20:49:58
|
Hi, this March, I announced the possiblity to do some limited form of kernel debugging using kgdb over FireWire (IEEE1394) on this list. On Thu, 16 Mar 2006, Amit Kale wrote: > > Bernhard, > > Ambitious project, I must say! Kindly keep us updated on the progress of this > project. > > -Amit Indeed ambitious. I have quite some news, now - probably too much for a singe mail but here is it all anyway... First, I would like to say that this project is not limited to real-kernel-debugging using kgdb over firewire (which is still very much in early alpha status), it can already be used quite reliably to read the printk buffer of the kernel remotely using a simple tool. It's called firescope and a small update of it is also part of this mail. I should probably add that this project relies on remote DMA access over firewire which the OHCI 1394 specification specifies to be implemented without host CPU intervention, which makes it possible to diagnose a remote system over Firewire by reading and writing it's main memory. This is only possible when: * The system to be diagnosed uses a FireWire controller which is built to the OHCI-1394 specification, * The controller is properly initialized * The request filters of the controller are set up in a way which require the OHCI-1394 controller to carry out these DMA transfers on behalf of the received requests. The idea behind the goal to be able to use kgdb over firewire is to be able to use it when other means of communication between gdb and the kernel kgdb stub over serial cable and ethernet are not possible because they are not available on the given HW (many recent notebooks have no serial port) or are not useable because something needs to be debugged very early on, before PCI/Ethernet initialisation. As far as I know PCI Ethernet drivers require PCI networking to be initialized so they cannot be initialized as early as a serial interfaces without additional work (unless somebody writes a special driver for the card, of course) I wrote an early initialisation routine for OHCI-1394 controllers which makes Firewire and OHCI 1394 a possible alternative to the serial port for for diagnosing early boot problems if the system has a PCI interface which can fit a OHCI-1394 controller or has one built-in such as many Notebooks. I can follow-up the initial accouncement these new items now: 1) I wrote a small paper which describes the whole project: http://www.suse.de/~bk/firewire/Firewire-debugging-under-Linux-v1.0.odt For a list of software which supports this format, see: http://en.wikipedia.org/wiki/OpenDocument#Software_2 http://en.wikipedia.org/wiki/OpenDocument_Software 2) I developed a generic I/O module for the kernel kgdb stub which allows use direct memory reads and writes using an DMA-capable I/O card (like an OHCI 1394-compatible Firewire controller) to communicate with an kgdb stub which has already taken command of the kernel (eg. because of a panic, a Sysreq, or an NMI) and which is waiting for communication at this point. I derived it from the kgdb I/O module for Ethernet (kgdboe) and called it kgdbom (KGDB over Memory) http://www.suse.de/~bk/firewire/kgdbom.patch The present patch still contains debuging statements which are not neccessary for normal use, but it is in an early aplha stage and has some limitation, e.g when the memory buffers are full communication does not restart at the start of the buffer again, so it can work only for a very short time and therefore needs more work before it can be used for real debugging. It can be seen as early development code, but it provided me with enough communication to retrieve a CPU-local stack backtrace from a kgdb stub over Firewire. 3) An adaption of the gdb<->firewire proxy which I announced in March: http://www.suse.de/~bk/firewire/fireproxy-0.34.tar.gz This version has the changes to talk with kgdb/kgdbom over firewire. 4) Andi Kleen also updated firescope, the tool which he ported to i386/x86_64 to a new, slighty improved version: ftp://ftp.suse.com/pub/people/ak/firescope It allows to read the kernel ring buffer (printk buffer), which you also see using the dmesg command, remotely over firewire if the system to be debugged has a properly initialized OHCI-1394 controller and is not programmed to filter physical DMA requests from the firewire node on wich firescope is running. 5) Early initialisation of OHCI 1394-compatible firewire controllers during Linux kernel boot, even before paging is set up, before the real HW initialisation takes place, this is long that debugging tools like fireproxy and firescope (to read the kernel ring buffer and arbitrary memory adresses) can be used at early boot stages. I am appending the description for it below my signature. Best Regards, Bernhard ---------------- Below, find the description of the early initialisation of OHCI1394-compatible FireWire/IEEE1394-controllers: The initial patch which is found at this URL: http://www.suse.de/~bk/firewire/ohci1394_earlyinit.diff It should apply to any recent 2.6 kernel, tested with 2.6.16. It works by scanning the PCI configuration space for OHCI1394-compatible devices (all OHCI1394-compatible firewire controllers are capable of physical DMA without host CPU intervention), and it intitializes the OHCI1394-compatible firewire controller with the lowest bus number. It would be a small further change to initialize all such controllers. What needs to be said about this initialisation is that after a firewire bus reset (which happens whenever a device joins or leaves the firewire bus), OHCI1394-compliant controllers require attention in the form of setting registers in order to allow remote physical DMA again: See the OHCI 1394 specification for reference: http://www.suse.de/~bk/firewire/ohci_11.pdf Quote: ----------------------------------- 12.6 Bus Reset On a bus reset, all pending physical requests (those for which ack_pending was sent) shall be discarded. Following a bus reset, only physical requests to the autonomous CSR resources (see section 5.5) can be handled immediately. Other physical requests may be processed after software initializes the filter registers (section 5.14). ----------------------------------- During early initialisation, we however have no interrupt handlers available, so we have to poll the registers of the contoller until initial bus reset which is triggered by the initialisation itself is finished. This means that the early initialisation has to include a slow polling loop on the registers of the OHCI1394 controller to bring the controller from the restricted bus reset mode into the normal operating mode where it can service physical DMA requests and the initisalisation also has to setup the request filter registers which also need to be properly intialized after each bus reset in order for physical DMA to function. It can take up to two seconds during initialisation until this loop completes and the OHCI1394 is in proper operating mode, so this is a noticaeble delay during boot. Since there is no guranteed servicing of the OHCI1394 devices in an error condition, physical DMA will only continue to work until the next bus reset. Mostly this means that the Firewire cables has to be firmly plugged into the ports to prevent contact interruption which would trigger a bus reset. It also means that the machine running firescope needs to be initialized and plugged into the firewire bus when the early initialisation happens. The initialisation happens at the moment uin the early system setup phase, directly after parsing the ACPI tables, quite a bit before paging is initialized: ACPI: RSDP (v002 ACPIAM ) @ ACPI: ... ACPI: DSDT .... Found OHCI vendor 1102 at 01:07.2 Found OHCI base feaff000, must be <ffffffffffdff000 ohci1394: OHCI-1394 1.1 (PCI): Max Packet=[2048] rom:0 Number of physical IEEE1394 ports found on this device: 2 Diagnostic only: Number of ISO Channels=15 On node 0 totalpages: 257611 DMA zone: 3106 pages, LIFO batch:0 DMA32 zone: 254505 pages, LIFO batch:31 Normal zone: 0 pages, LIFO batch:0 HighMem zone: 0 pages, LIFO batch:0 .... boot continues here ... This also happens even longer before the VGA console is initialized, long before printk messages are actually shown on a VGA console. --- END --- |