From: Jan B. <JBe...@no...> - 2004-03-31 10:36:59
|
As I have been asked to bring forward to a usable form code that was originally written (but never integrated) by people at IBM Germany, I have run into a couple of shortcomings, solutions to which I'd like to discuss/obtain this way. To provide raw background, the code to be added is meant to extract from the lkcd dump any of (a) a single processes (thread's), (b) a multi-threaded processes, (c) the kernel's CPU-centric, or (d) the kernel's process-centric ELF core dump. While generally the lkcd dump presents a superset of the information required for this, to make this meaningful, certain additional pieces of information are required that I wasn't able to find a way to obtain. Along with doing the work (and actually obtaining a couple of dumps), I additionally found issues that are not directly related to the core dump conversion. - The running system's page size. The utilities continue to use hard coded values for this, but since at least on some architectures (IA64) this is runtime-determined, this problem must finally be solved. In my opinion, this is simply a missing element in the architecture independent file header, but one could of course argue that only the architectures that have flexible page sizes need this (so it could also go into the architecture-specific header; this, however, would prevent a future decision on some architecture having fixed page sizes today to make the page size flexible without breaking compatibility). - The running systems clock tick rate. This should be communicated in the architecture independent header. - The full set of (kernel) registers at the time the dump was taken. Currently, only S390 seems to communicate a complete register set (through the lowcore). Other architectures are missing anything that doesn't live in pt_regs, e.g. x86: floating point, XMM, control, and debug registers, and at least some important MSRs ia64: most floating point registers, all callee-save general registers, some application registers, all control and indirect registers Even outside of core dump conversion, this is a rather limiting factor when having to analyse difficult kernel crash scenarios. These should all be saved (per-CPU) in the architecture-specific header. - There does not seem to be a mechanism (in libklib) to access array elements (since only the size of a full object can be obtained, this can also not be emulated, since the requestor cannot have/obtain knowledge of either array element type or number of array elements). Am I overlooking something? - The lkcd script assumes that the kernel on which 'lkcd save' is run during the next boot is the same kernel that just crashed. While this may be correct for most production systems of customers, it is rather questionable for development systems. I would therefor like to (see) add(ed) functionality to permit lcrash to just extract the version string from the dump (without requiring any additional input files) so that the script can then collect the remaining input files based on that information. - The 4.2 lkcd script fails when unable to obtain module information. I believe that this should be considered optional information, and thus should just be reported (as a warning) without affecting the outcome of the whole operation. I am also somewhat unclear about the importance of running the initial analysis during system boot; I'd rather see this as something that could be done better when the system is up again. - Finally, I am being told that there is work in progress adding AMD64 support to lkcd. Since I was planning to do the same, I'd like to get in touch with those doing this work, if possible (at least to add respective core dump conversion support). Since I am rather unhappy with the way the architecture multiplexing currently works (to add an architecure you have, along with adding the architecture specific files, modify far too many architecure-independent source files), I was also planning on putting in a better mechanism to simplify this process at this occasion. Jan Beulich Novell, Inc. |