From: William H. N. <wil...@ai...> - 2000-02-29 16:18:55
|
On Tue, Feb 29, 2000 at 09:06:11AM +0100, Raymond Wiker wrote: > William Harold Newman writes: > > On Mon, Feb 28, 2000 at 04:22:42PM +0100, Raymond Wiker wrote: > > > I'm having trouble debugging this, as gdb does not appear to > > > work with breakpoints in code called from lisp. I.e, once the runtime > > > has made the transfer to lisp, my breakpoints will not be > > > honoured. I'll see if a more recent version of gdb works better. > > > > My impression was that this happens because the Lisp system wants to > > handle all signals itself, and one of the signals it takes over is the > > one which gdb uses for its breakpoints. My guess is that if you want > > to be able to set gdb breakpoints in a running SBCL, it might be more > > fruitful not to look for a newer gdb but instead to cook up some way > > to suppress the way SBCL munges this particular signal. Perhaps it > > could be controlled with an --under-gdb option from the command line > > or something. > > I got a little bit further last night - turns out that there > were a couple more instances in the lisp code where an underscore was > prepended to foreign symbols. I had also left out undefineds.c from > the compilation, which accounted for 31 of the 32 cases of "undefined > foreign symbol" that I noticed (the last was update_errno, which seems > to be specific to Linux/glibc2). It'd probably be be good to move this discussion to the sbcl-devel mailing list, since about a week ago I had a discussion with Daniel Barlow and Peter Van Eynde about issues with update_errno. If I understood correctly, some hack like that is needed with the new libc because errno has become a macro instead of an ordinary variable, in order to make it thread-safe. Daniel Barlow was looking for opinions about how to handle errno in his sockets interface. So if you're running into porting problems with it, it might be good to have everyone in on the discussion. [So I've sent this reply back via sbc...@li....] -- William Harold Newman <wil...@ai...> software consultant PGP key fingerprint 85 CE 1C BA 79 8D 51 8C B9 25 FB EE E0 C3 E5 7C |
From: William H. N. <wil...@ai...> - 2000-03-01 16:41:35
|
On Wed, Mar 01, 2000 at 10:39:22AM +0100, Raymond Wiker wrote: > I've gotten a bit further, again: I switched on the sb-show > feature in target-features.lisp-expr, and that pinpointed a problem > with os-cold-init-or-reinit, which is only defined for linux. I > "fixed" this by conditionalising the call to os-cold-init-or-reinit > with #!+linux - is this right? In my opinion, the best way to handle most differences between systems is to have a set of abstract functions (and constants) which are called for every system. Putting in a bunch of #!+/#!- or #+/#- code is usually second best. It's like the difference between using OO method dispatch or using switch statements -- by breaking down dependencies into abstract functions, you can usually make the code more maintainable. In the existing code, this model is not followed very consistently, in part because I didn't write most of the existing code. But I'd like to see SBCL move toward this model. So the answer is, I believe it would have been cleaner for you to add a definition a la ;; On FreeBSD, this is just a no-op. (defun os-cold-init-or-reinit () ) to the appropriate file (freebsd-os.lisp?) instead of wrapping the OS-COLD-INIT-OR-REINIT call with #+LINUX. But you can do even better than that, because.. In fact, CMU CL did follow the OO-ish model that I advocate in the case of the OS-COLD-INIT-OR-REINIT function, it's just that they called the function OS-INIT instead. I renamed a number of functions to try to make it clear whether they're used at cold init only or in reinit too or only in reinit. I realize that I was causing friction, and I regret that, but I thought it was worthwhile to avoid the confusion of having FOO-INIT be used both at cold init and at reinit while BAR-INIT is used only at cold init. (In fact there are many, many renamed symbols in SBCL relative to CMU CL, e.g. PRIMEP to POSITIVE-PRIMEP MAKE-KEYWORD to MAKE-KEYWORD-FOR-ARG DUMP-SHORT-FLOAT to DUMP-SHORT-OR-SINGLE-FLOAT special variables FOO to *FOO* DEFUN FIXNUM to DEFUN FIXNUMIZE C::MAKE-SEGMENT to C::MAKE-DEFAULT-SEGMENT often to more correctly describe the meaning of the thing named, sometimes to reduce package problems, sometimes for other reasons. I think most of the changes were worth the friction that they cause, but the ones which affect porting could cause a *lot* of friction, so maybe they're an exception..) So anyway, my recommendation is that you try using the OS-INIT function from CMU CL's freebsd-os.lisp as your implementation of OS-COLD-INIT-OR-REINIT. > At the moment I get as far as the first call to gc, which > crashes because an object (the first?) has a type tag of 0xbe (190), > which is unknown to gencgc.c (via sbcl.h). The highest type tag in use > seems to be scavenger_hook, which is 0xba (186). > > Is it possible that the :freebsd feature enables an additional > type that is somehow not picked up by the code in genesis.lisp? > (Either because it's placed in a different package or because it > breaks with the "protocol" for primitive types.) I'll check this, > anyhow. It is possible that somewhere someone has hand-coded a use of a particular type code which isn't cleanly propagated into sbcl.h, and that it doesn't come out in the Linux build. (I've never seen a problem like this.) However, I suspect it's more likely that you're seeing some grosser problem (memory corruption or something). -- William Harold Newman <wil...@ai...> software consultant PGP key fingerprint 85 CE 1C BA 79 8D 51 8C B9 25 FB EE E0 C3 E5 7C |
From: Raymond W. <ra...@or...> - 2000-03-02 08:51:07
|
William Harold Newman writes: > On Wed, Mar 01, 2000 at 10:39:22AM +0100, Raymond Wiker wrote: > > I've gotten a bit further, again: I switched on the sb-show > > feature in target-features.lisp-expr, and that pinpointed a problem > > with os-cold-init-or-reinit, which is only defined for linux. I > > "fixed" this by conditionalising the call to os-cold-init-or-reinit > > with #!+linux - is this right? > > In my opinion, the best way to handle most differences between systems > is to have a set of abstract functions (and constants) which are > called for every system. Putting in a bunch of #!+/#!- or #+/#- code > is usually second best. It's like the difference between using OO > method dispatch or using switch statements -- by breaking down > dependencies into abstract functions, you can usually make the code > more maintainable. Ok. > So anyway, my recommendation is that you try using the OS-INIT > function from CMU CL's freebsd-os.lisp as your implementation of > OS-COLD-INIT-OR-REINIT. Ok, I did that - actually, I copied bsd-os.lisp from cmucl, renamed os-init and changed the package names referenced in the file. I also added the feature :bsd to target-features.lisp-expr, since that's the feature that triggers loading of bsd-os.lisp. At the moment I get a call through undefined_tramp in the second-last form in bsd-os.lisp at dump time, but I expect that's trivial to fix. > It is possible that somewhere someone has hand-coded a use of a > particular type code which isn't cleanly propagated into sbcl.h, and > that it doesn't come out in the Linux build. (I've never seen a > problem like this.) However, I suspect it's more likely that you're > seeing some grosser problem (memory corruption or something). Ok. I'll check up on this later; I'll just have to find what's wrong in bsd-os.lisp first :-) I mentioned earlier that FreeBSD, like Linux, defines errno as a macro. Actually, errno *is* an int variable as well, but this is only used in single-threaded programs, or the initial thread of multi-threaded programs. I expect that there may be other variables with the same behaviour. Maybe it would be better to access variables in the C runtime via stub functions (in C) instead? //Raymond. |
From: Raymond W. <ra...@or...> - 2000-03-01 09:43:55
|
William Harold Newman writes: > On Tue, Feb 29, 2000 at 09:06:11AM +0100, Raymond Wiker wrote: > > William Harold Newman writes: > > > On Mon, Feb 28, 2000 at 04:22:42PM +0100, Raymond Wiker wrote: > > > > I'm having trouble debugging this, as gdb does not appear to > > > > work with breakpoints in code called from lisp. I.e, once the runtime > > > > has made the transfer to lisp, my breakpoints will not be > > > > honoured. I'll see if a more recent version of gdb works better. > > > > > > My impression was that this happens because the Lisp system wants to > > > handle all signals itself, and one of the signals it takes over is the > > > one which gdb uses for its breakpoints. My guess is that if you want > > > to be able to set gdb breakpoints in a running SBCL, it might be more > > > fruitful not to look for a newer gdb but instead to cook up some way > > > to suppress the way SBCL munges this particular signal. Perhaps it > > > could be controlled with an --under-gdb option from the command line > > > or something. It took quite some time before I understood what you meant here... as I said, I had no problems setting breakpoints, but that was before sbcl got so far as to take over the signals (SIGABRT in particular). > > I got a little bit further last night - turns out that there > > were a couple more instances in the lisp code where an underscore was > > prepended to foreign symbols. I had also left out undefineds.c from > > the compilation, which accounted for 31 of the 32 cases of "undefined > > foreign symbol" that I noticed (the last was update_errno, which seems > > to be specific to Linux/glibc2). > > It'd probably be be good to move this discussion to the sbcl-devel mailing > list, since about a week ago I had a discussion with Daniel Barlow > and Peter Van Eynde about issues with update_errno. > > If I understood correctly, some hack like that is needed with the > new libc because errno has become a macro instead of an ordinary > variable, in order to make it thread-safe. Daniel Barlow was looking > for opinions about how to handle errno in his sockets interface. > So if you're running into porting problems with it, it might > be good to have everyone in on the discussion. The same trick seems to be necessary in FreeBSD, which defines errno as a macro that calls a function via a pointer (huh?). It might be better to simply have a function that retrieves the current value of errno from the C runtime, rather than first calling a function to update the value, and then retrieving the value. I've gotten a bit further, again: I switched on the sb-show feature in target-features.lisp-expr, and that pinpointed a problem with os-cold-init-or-reinit, which is only defined for linux. I "fixed" this by conditionalising the call to os-cold-init-or-reinit with #!+linux - is this right? At the moment I get as far as the first call to gc, which crashes because an object (the first?) has a type tag of 0xbe (190), which is unknown to gencgc.c (via sbcl.h). The highest type tag in use seems to be scavenger_hook, which is 0xba (186). Is it possible that the :freebsd feature enables an additional type that is somehow not picked up by the code in genesis.lisp? (Either because it's placed in a different package or because it breaks with the "protocol" for primitive types.) I'll check this, anyhow. > [So I've sent this reply back via sbc...@li....] Noted - the wider audience is the reason that I've provided more details than you need :-) //Raymond. |
From: Raymond W. <ra...@or...> - 2000-03-07 10:50:02
|
Raymond Wiker writes: > At the moment I get as far as the first call to gc, which > crashes because an object (the first?) has a type tag of 0xbe (190), > which is unknown to gencgc.c (via sbcl.h). The highest type tag in use > seems to be scavenger_hook, which is 0xba (186). Last night I recompiled, with QSHOW set to 1 in src/runtime/runtime.h. This caused the "crash object" to change, and a quick comparison with sbcl.nm showed that it was actually equal to (the address of) closure_tramp (from x86-assem.S). I assume that this means that there is a problem with scavenging code vectors (in gencgc.c), or that there is a bug in the lisp compiler (problem with the code object headers). Note that the error does *not* in any way happen for the first object on the heap, although it is possible that it is the first *code* object. I'll put in extra trace printouts in gencgc.c to check this. Question: in output/cold-sbcl.map, there are two addresses listed for each function, and it appears that the difference between the first and the last (which appears in a comment) is 0x17. Could anyone tell me what these two addresses are? (My guess is that the first is the entry point, and the second is the header address, but this may well be wrong :-) //Raymond. |
From: William H. N. <wil...@ai...> - 2000-03-08 04:47:37
|
On Tue, Mar 07, 2000 at 11:45:11AM +0100, Raymond Wiker wrote: > Question: in output/cold-sbcl.map, there are two addresses > listed for each function, and it appears that the difference between > the first and the last (which appears in a comment) is 0x17. Could > anyone tell me what these two addresses are? (My guess is that the > first is the entry point, and the second is the header address, but > this may well be wrong :-) That guess matches my impression from when I've had to mess around with gdb-level debugging of SBCL. I'm sorry I can't give you a more confident answer, but at least I can tell you with some confidence that it should be the same as for CMU CL, not a victim of SBCL tweaking. Incidentally, I should warn you that some of the internal CMU CL documentation seems somewhat out of date (with respect to CMU CL itself, not just SBCL) on this and some other function implementation issues, at least for the X86 port. (It's still very useful, though.) -- William Harold Newman <wil...@ai...> software consultant PGP key fingerprint 85 CE 1C BA 79 8D 51 8C B9 25 FB EE E0 C3 E5 7C |
From: Raymond W. <ra...@or...> - 2000-03-08 08:51:42
|
[ Note: DTC Cc:'ed as he is the author of gencgc.c & friends. ] William Harold Newman writes: > On Tue, Mar 07, 2000 at 11:45:11AM +0100, Raymond Wiker wrote: > > Question: in output/cold-sbcl.map, there are two addresses > > listed for each function, and it appears that the difference between > > the first and the last (which appears in a comment) is 0x17. Could > > anyone tell me what these two addresses are? (My guess is that the > > first is the entry point, and the second is the header address, but > > this may well be wrong :-) > > That guess matches my impression from when I've had to mess around > with gdb-level debugging of SBCL. I'm sorry I can't give you a more > confident answer, but at least I can tell you with some confidence > that it should be the same as for CMU CL, not a victim of SBCL > tweaking. I did some more hacking on this last night, and I think that the 0x17 offset is defined in gencgc.c as #define RAW_ADDR_OFFSET (6*sizeof(lispobj) - type_FunctionPointer) and used in scav_fdefn. I still haven't got round to modifying SBCL so that it doesn't clobber the debugger (gdb) breakpoints (I'm not even convinced that it can be done without a lot of effort). I *did* do some further poking around with the debugger *after* the crash though. Here's what I found out: From gencgc.c: static int scav_fdefn(lispobj *where, lispobj object) { struct fdefn *fdefn; fdefn = (struct fdefn *)where; if ((char *)(fdefn->function + RAW_ADDR_OFFSET) == fdefn->raw_addr) { scavenge(where + 1, sizeof(struct fdefn)/sizeof(lispobj) - 1); /* Don't write unnecessarily. */ if (fdefn->raw_addr != (char *)(fdefn->function + RAW_ADDR_OFFSET)) fdefn->raw_addr = (char *)(fdefn->function + RAW_ADDR_OFFSET); return sizeof(struct fdefn) / sizeof(lispobj); } else { return 1; } } The crash happens when scan_fdefn is called for an fdefn structure with the following data: 0x480add38: 0x000003b6 (fdefn header) 0x480add3c: 0x480add07 (name(?), other pointer) 0x480add40: 0x4d64d4f9 (function, function pointer, val = 0x4d64d4f8) 0x480add44: 0x08<something> (raw_addr, = closure_tramp (from x86-assem.S) ) 0x4d64d4f8: 0x00000382 (simple array signed byte 30) ??? This fdefn object is the only object where the second branch of the if test in scav_fdefn is called - at least, up to that point. As a result, the values (lispobjs) that follow are treated obe by one, and eventually, the foreign address for closure_tramp is treated like a lispobj. Ka-boom :-) I'm not sure what the problem is. The two possibilities I see are either 1) The heap values are wrong, and the "bare" address of closure_tramp should never appear like this. The reference to 0x4d64d4f8 seems odd, as does the values stored in that area. 2) scav_fdefn is wrong, and should be set up to scavenge name and function only, and to skip raw_addr. This is not *very* likely, as the code, as far as I can see, is identical (modulo indentation) in CMUCL and SBCL. //Raymond. -- Raymond Wiker, Orion Systems AS +47 370 61150 |
From: Douglas T. C. <dt...@je...> - 2000-03-08 12:49:34
|
Raymond Wiker wrote: ... > William Harold Newman writes: > > On Tue, Mar 07, 2000 at 11:45:11AM +0100, Raymond Wiker wrote: > > > Question: in output/cold-sbcl.map, there are two addresses > > > listed for each function, and it appears that the difference between > > > the first and the last (which appears in a comment) is 0x17. Could > > > anyone tell me what these two addresses are? (My guess is that the > > > first is the entry point, and the second is the header address, but > > > this may well be wrong :-) Yes, one is the raw entry address and the other the object with its function tag. ... > The crash happens when scan_fdefn is called for an fdefn > structure with the following data: > > 0x480add38: 0x000003b6 (fdefn header) > 0x480add3c: 0x480add07 (name(?), other pointer) > 0x480add40: 0x4d64d4f9 (function, function pointer, val = 0x4d64d4f8) > 0x480add44: 0x08<something> (raw_addr, = closure_tramp (from x86-assem.S) ) > > 0x4d64d4f8: 0x00000382 (simple array signed byte 30) ??? The header values appear to differ from the CMUCL values as none of the CMUCL branches has the fdefn object with a header value of 0xb6. > This fdefn object is the only object where the second branch > of the if test in scav_fdefn is called - at least, up to that > point. As a result, the values (lispobjs) that follow are treated obe > by one, and eventually, the foreign address for closure_tramp is > treated like a lispobj. Ka-boom :-) Check the raw_addr of the closure_tramp; if correctly aligned it should appear to be a fixnum and thus be safe to scavenge. ... > closure_tramp should never appear like this. The reference to > 0x4d64d4f8 seems odd, as does the values stored in that area. Yes, if this were a vector it would be invalid; double check the header values. For standard CMUCL the header of 0x82 = 130 is a closure header? I'd be very keen to track down any such bug. If it is repeatable on standard CMUCL could someone please point me towards an example. Regards Douglas Crosher |
From: Raymond W. <ra...@or...> - 2000-03-08 13:05:19
|
Douglas T. Crosher writes: > Raymond Wiker wrote: > ... > > William Harold Newman writes: > > > On Tue, Mar 07, 2000 at 11:45:11AM +0100, Raymond Wiker wrote: > > > > Question: in output/cold-sbcl.map, there are two addresses > > > > listed for each function, and it appears that the difference between > > > > the first and the last (which appears in a comment) is 0x17. Could > > > > anyone tell me what these two addresses are? (My guess is that the > > > > first is the entry point, and the second is the header address, but > > > > this may well be wrong :-) > > Yes, one is the raw entry address and the other the object with its > function tag. Ok. > ... > > The crash happens when scan_fdefn is called for an fdefn > > structure with the following data: > > > > 0x480add38: 0x000003b6 (fdefn header) > > 0x480add3c: 0x480add07 (name(?), other pointer) > > 0x480add40: 0x4d64d4f9 (function, function pointer, val = 0x4d64d4f8) > > 0x480add44: 0x08<something> (raw_addr, = closure_tramp (from x86-assem.S) ) > > > > 0x4d64d4f8: 0x00000382 (simple array signed byte 30) ??? > > The header values appear to differ from the CMUCL values as none of > the CMUCL branches has the fdefn object with a header value of > 0xb6. I think this is to be expected, given the differences in the build pricess between SBCL and CMUCL. Might be helpful to compare sbcl.h between a Linux and a FreeBSd build of SBCL, though. > > This fdefn object is the only object where the second branch > > of the if test in scav_fdefn is called - at least, up to that > > point. As a result, the values (lispobjs) that follow are treated obe > > by one, and eventually, the foreign address for closure_tramp is > > treated like a lispobj. Ka-boom :-) > > Check the raw_addr of the closure_tramp; if correctly aligned it > should appear to be a fixnum and thus be safe to scavenge. Hmmm... It doesn't appear to be aligned so that it looks like a fixnum; I could probably fix this by changing x86-assem.S. > > closure_tramp should never appear like this. The reference to > > 0x4d64d4f8 seems odd, as does the values stored in that area. > > Yes, if this were a vector it would be invalid; double check the > header values. For standard CMUCL the header of 0x82 = 130 is a > closure header? > > I'd be very keen to track down any such bug. If it is repeatable > on standard CMUCL could someone please point me towards an example. It's *probably* not a CMUCL bug, unless it's related to ELF/a.out differences or something like that. //Raymond. |
From: William H. N. <wil...@ai...> - 2000-03-08 15:33:04
|
On Wed, Mar 08, 2000 at 11:46:12PM +1100, Douglas T. Crosher wrote: > Raymond Wiker wrote: > > The crash happens when scan_fdefn is called for an fdefn > > structure with the following data: > > > > 0x480add38: 0x000003b6 (fdefn header) > > 0x480add3c: 0x480add07 (name(?), other pointer) > > 0x480add40: 0x4d64d4f9 (function, function pointer, val = 0x4d64d4f8) > > 0x480add44: 0x08<something> (raw_addr, = closure_tramp (from x86-assem.S) ) > > > > 0x4d64d4f8: 0x00000382 (simple array signed byte 30) ??? > > The header values appear to differ from the CMUCL values as none > of the CMUCL branches has the fdefn object with a header value of 0xb6. Yes, I wrote in an earlier message that SBCL should be the same as CMU CL at this level, but I guess I oversimplified, since heap type codes are different between SBCL and CMU CL. I've deleted unused tag values (e.g. DYLAN-FUNCTION-HEADER-TYPE, and perhaps soon SCAVENGER-HOOK-TYPE) from objdef.lisp/early-objdef.lisp, which has caused other tag values to change. -- William Harold Newman <wil...@ai...> software consultant PGP key fingerprint 85 CE 1C BA 79 8D 51 8C B9 25 FB EE E0 C3 E5 7C |