From: Sam S. <sd...@gn...> - 2000-12-14 15:41:04
|
How do I debug a segfault? Somehow the memory is corrupted and I get random segfaults. Do I compile with some special settings? Thanks. -- Sam Steingold (http://www.podval.org/~sds) Support Israel's right to defend herself! <http://www.i-charity.com/go/israel> Read what the Arab leaders say to their people on <http://www.memri.org/> PI seconds is a nanocentury |
From: Bruno H. <ha...@il...> - 2000-12-14 18:01:57
|
Sam writes: > How do I debug a segfault? > Somehow the memory is corrupted and I get random segfaults. Ouch. This is hard. I'd try to insert many (gc) calls in order to get an idea which function corrupts memory. Once you got it located, proofread the suspicious code. Bruno |
From: Sam S. <sd...@gn...> - 2000-12-14 21:44:13
|
> * In message <149...@ho...> > * On the subject of "Re: segfault" > * Sent on Thu, 14 Dec 2000 19:01:44 +0100 (CET) > * Honorable Bruno Haible <ha...@il...> writes: > > Sam writes: > > How do I debug a segfault? > > Somehow the memory is corrupted and I get random segfaults. > > Ouch. This is hard. I'd try to insert many (gc) calls in order to get > an idea which function corrupts memory. Bruno, thanks a bundle!!! I suggest this in lispbibl.d (so that I won't have to reinvent the wheel every time I have a fault): #define CHECK_MEM(msg) do { printf msg; gar_col(); } while(0) > Once you got it located, proofread the suspicious code. the bad news is that the "suspicious code" is the innocently looking call to allocate_dir_key()! Bruno, do you mind looking at lispbibl.d? -- Sam Steingold (http://www.podval.org/~sds) Support Israel's right to defend herself! <http://www.i-charity.com/go/israel> Read what the Arab leaders say to their people on <http://www.memri.org/> The difference between genius and stupidity is that genius has its limits. |
From: Bruno H. <ha...@il...> - 2000-12-18 20:32:34
|
Sam writes: > Bruno, do you mind looking at lispbibl.d? lispbibl.d looks fine. I haven't looked at dirkey.d. > #define CHECK_MEM(msg) do { printf msg; gar_col(); } while(0) > > the bad news is that the "suspicious code" is the innocently looking > call to allocate_dir_key()! Note that this CHECK_MEM, inserted at the wrong place, will kill all your local variables of type 'object' (i.e. everything you have not pushSTACK'ed). Therefore I think it's a bad idea do put it into lispbibl.d like this. Bruno |
From: Sam S. <sd...@gn...> - 2000-12-18 21:32:34
|
> * In message <149...@ho...> > * On the subject of "Re: segfault" > * Sent on Mon, 18 Dec 2000 21:32:10 +0100 (CET) > * Honorable Bruno Haible <ha...@il...> writes: > > Sam writes: > > > #define CHECK_MEM(msg) do { printf msg; gar_col(); } while(0) > > > > the bad news is that the "suspicious code" is the innocently looking > > call to allocate_dir_key()! > > Note that this CHECK_MEM, inserted at the wrong place, will kill all > your local variables of type 'object' (i.e. everything you have not > pushSTACK'ed). I did not realize that! I will investigate dirkey.d further. On the same subject, I get a segfault with the current CVS sources on win32 with this form: [1]> (decode-universal-time 12345678900) *** - handle_fault error2 ! address = 0x0 not in [0x1A500000,0x1A5BB730) ! SIGSEGV cannot be cured. Fault address = 0x0. (works fine on solaris). -- Sam Steingold (http://www.podval.org/~sds) Support Israel's right to defend herself! <http://www.i-charity.com/go/israel> Read what the Arab leaders say to their people on <http://www.memri.org/> A professor is someone who talks in someone else's sleep. |
From: Bruno H. <ha...@il...> - 2000-12-19 21:58:14
|
Sam writes: > On the same subject, I get a segfault with the current CVS sources on > win32 with this form: > > [1]> (decode-universal-time 12345678900) > > *** - handle_fault error2 ! address = 0x0 not in [0x1A500000,0x1A5BB730) ! > SIGSEGV cannot be cured. Fault address = 0x0. It isn't really the same subject. Here it is Win32 which crashes. I've put in a workaround now. Bruno |
From: Sam S. <sd...@gn...> - 2000-12-18 22:14:00
|
> * In message <149...@ho...> > * On the subject of "Re: segfault" > * Sent on Mon, 18 Dec 2000 21:32:10 +0100 (CET) > * Honorable Bruno Haible <ha...@il...> writes: > > Note that this CHECK_MEM, inserted at the wrong place, will kill all > your local variables of type 'object' (i.e. everything you have not > pushSTACK'ed). what about value1, value2 &c? -- Sam Steingold (http://www.podval.org/~sds) Support Israel's right to defend herself! <http://www.i-charity.com/go/israel> Read what the Arab leaders say to their people on <http://www.memri.org/> Bill Gates is great, as long as `bill' is a verb. |
From: Bruno H. <ha...@il...> - 2000-12-18 22:44:16
|
Sam writes: > what about value1, value2 &c? Like local variables. Save them yourself before gar_col(). Bruno |
From: Sam S. <sd...@gn...> - 2000-12-18 23:28:53
|
> * In message <149...@ho...> > * On the subject of "Re: segfault" > * Sent on Thu, 14 Dec 2000 19:01:44 +0100 (CET) > * Honorable Bruno Haible <ha...@il...> writes: > > Sam writes: > > How do I debug a segfault? > > Somehow the memory is corrupted and I get random segfaults. > > Ouch. This is hard. I'd try to insert many (gc) calls in order to get > an idea which function corrupts memory. I tried. GCs slow down the process quite a bit - it is still running (no segfault yet - after many hundreds of GCs). what about unbalanced pushSTACK()/popSTACK() pairs? what if I push more than pop or vv? will that be detected or will I get a segfault? -- Sam Steingold (http://www.podval.org/~sds) Support Israel's right to defend herself! <http://www.i-charity.com/go/israel> Read what the Arab leaders say to their people on <http://www.memri.org/> Binaries die but source code lives forever. |
From: Bruno H. <ha...@il...> - 2000-12-19 13:45:58
|
Sam writes: > what about unbalanced pushSTACK()/popSTACK() pairs? > what if I push more than pop or vv? > will that be detected or will I get a segfault? That will be detected if you define SAFETY to at least 1 in CFLAGS. Bruno |
From: Sam S. <sd...@gn...> - 2000-12-19 15:10:45
|
> * In message <149...@ho...> > * On the subject of "Re: segfault" > * Sent on Thu, 14 Dec 2000 19:01:44 +0100 (CET) > * Honorable Bruno Haible <ha...@il...> writes: > > Sam writes: > > How do I debug a segfault? > > Somehow the memory is corrupted and I get random segfaults. > > Ouch. This is hard. I'd try to insert many (gc) calls in order to get > an idea which function corrupts memory. test form: (with-open-file (out "c:/tmp/registry.txt" :direction :output) (with-dir-key-open (dkey :win32 "HKEY_CLASSES_ROOT") (dir-key-dump-tree dkey "" :out out :collect nil))) Without GCs, I get a segfault after writing 360,448 bytes (in seconds) with GCs, I get a normal termination after 177,304 GCs which creates a 3,480,718 byte file (in 4+ hours). -- Sam Steingold (http://www.podval.org/~sds) Support Israel's right to defend herself! <http://www.i-charity.com/go/israel> Read what the Arab leaders say to their people on <http://www.memri.org/> Lisp: its not just for geniuses anymore. |
From: Bruno H. <ha...@il...> - 2000-12-20 15:15:32
|
Sam writes: > test form: > > (with-open-file (out "c:/tmp/registry.txt" :direction :output) > (with-dir-key-open (dkey :win32 "HKEY_CLASSES_ROOT") > (dir-key-dump-tree dkey "" :out out :collect nil))) OK, I see. So what needs to be proofread is dirkey.d. > Without GCs, I get a segfault after writing 360,448 bytes (in seconds) > > with GCs, I get a normal termination after 177,304 GCs which creates a > 3,480,718 byte file (in 4+ hours). Good result. There are two kinds of GC related bugs: a) random memory corruption, b) GC is called at unexpected places, and you didn't save the objects on the STACK. Since the crash went away with GCs, this practically excludes a). So it must be b). I proofread the first half of dirkey.d and made the following fixes: - registry_value_to_object, REG_MULTI_SZ: - you were not saving 'ret' and 'tail' during allocate_cons() and n_char_to_string. - Cdr(tail) = n_char_to_string(buffer+ii,len,O(misc_encoding)); must be split into two statements because the GC caused in n_char_to_string can relocate the 'tail' cons. - Cdr(tail) = NIL; is not needed because fresh conses are always (nil . nil). - the strlen call was a possible buffer overrun. - parse_registry_path - the strncmp call would have the effect of treating "HKEY_LOC" like "HKEY_LOCAL_MACHINE". Need to compare the lengths as well. - DIR-KEY-OPEN - you were not saving direction_arg, ret_handle, path during allocate_dir_key(). - you were not saving dkey during string_concat(3). - you were not saving dkey during funcall of FINALIZE. - MAKE_OBJECT_LIST - you were calling alloca (which MUST be a macro) with a side-effecting argument, very dangerous - you were not switching off begin/end_system_call during allocate_cons. (Needed because allocate_cons may trigger GC, and GC needs to make system calls, and begin_system_call can not be nested.) - you were not saving tail and ret during allocate_cons and asciz_to_string I'm unsure about the line var DWORD len = maxlen; /* or maxlen+1 ?? */ because I don't know what RegEnumKey/RegEnumValue/RegQueryInfoKey expect. I hope you got the idea what needs to be done and can continue on the second half (everything after MAKE_OBJECT_LIST). Bruno |
From: Sam S. <sd...@gn...> - 2000-12-20 15:27:07
|
> * In message <u1y...@xc...> > * On the subject of "Re: segfault" > * Sent on 19 Dec 2000 10:08:09 -0500 > * I write: > > > * In message <149...@ho...> > > * On the subject of "Re: segfault" > > * Sent on Thu, 14 Dec 2000 19:01:44 +0100 (CET) > > * Honorable Bruno Haible <ha...@il...> writes: > > > > Ouch. This is hard. I'd try to insert many (gc) calls in order to get > > an idea which function corrupts memory. > > test form: > > (with-open-file (out "c:/tmp/registry.txt" :direction :output) > (with-dir-key-open (dkey :win32 "HKEY_CLASSES_ROOT") > (dir-key-dump-tree dkey "" :out out :collect nil))) > > Without GCs, I get a segfault after writing 360,448 bytes (in seconds) > > with GCs, I get a normal termination after 177,304 GCs which creates a > 3,480,718 byte file (in 4+ hours). Okay, I looked at your changes in dirkey.d IIUC, if a function can trigger a GC, it (and its callers!) may not have any object variables - everything must be on the stack. right? why didn't you tell me that right away?! -- Sam Steingold (http://www.podval.org/~sds) Support Israel's right to defend herself! <http://www.i-charity.com/go/israel> Read what the Arab leaders say to their people on <http://www.memri.org/> Single tasking: Just Say No. |
From: Bruno H. <ha...@il...> - 2000-12-20 16:06:07
|
Sam writes: > IIUC, if a function can trigger a GC, it (and its callers!) may not have > any object variables - everything must be on the stack. right? Right. There are a few exceptions: If you know something is a fixnum, you need not pushSTACK/popSTACK it, because GC does not "move" fixnums. Similarly for subr_self: it is unprotected, but SUBRs are not moved either. > why didn't you tell me that right away?! I'm sorry; I thought it was common knowledge on this list by now. > [btw, please do not abuse `const': > > dirkey.i.c > dirkey.d(272) : warning C4090: 'function' : different 'const' qualifiers > dirkey.d(272) : warning C4024: 'strncpy' : different types for formal and actual parameter 1 > dirkey.d(273) : error C2166: l-value specifies const object Oops, I was putting in as much const as possible. Looks like this was one too much. Bruno |
From: Sam S. <sd...@gn...> - 2000-12-20 15:31:45
|
> * In message <149...@ho...> > * On the subject of "Re: segfault" > * Sent on Tue, 19 Dec 2000 22:58:19 +0100 (CET) > * Honorable Bruno Haible <ha...@il...> writes: > > Sam writes: > > > On the same subject, I get a segfault with the current CVS sources on > > win32 with this form: > > > > [1]> (decode-universal-time 12345678900) > > > > *** - handle_fault error2 ! address = 0x0 not in [0x1A500000,0x1A5BB730) ! > > SIGSEGV cannot be cured. Fault address = 0x0. > > It isn't really the same subject. Here it is Win32 which crashes. i though a segfault is a segfault > I've put in a workaround now. I still get the same crash [btw, please do not abuse `const': dirkey.i.c dirkey.d(272) : warning C4090: 'function' : different 'const' qualifiers dirkey.d(272) : warning C4024: 'strncpy' : different types for formal and actual parameter 1 dirkey.d(273) : error C2166: l-value specifies const object dirkey.d(404) : warning C4090: 'function' : different 'const' qualifiers dirkey.d(404) : warning C4024: 'open_reg_key' : different types for formal and a ctual parameter 2 NMAKE : fatal error U1077: 'cl' : return code '0x2' Stop. I fixed this in my sources] -- Sam Steingold (http://www.podval.org/~sds) Support Israel's right to defend herself! <http://www.i-charity.com/go/israel> Read what the Arab leaders say to their people on <http://www.memri.org/> cogito cogito ergo cogito sum |
From: Bruno H. <ha...@il...> - 2000-12-20 16:07:52
|
Sam writes: > > I've put in a workaround now. > > I still get the same crash Does (sys::default-time-zone 1210131) work for you? Does (sys::default-time-zone 1210132) work for you? Where's the limit between "works" and "crash"? Bruno |
From: Sam S. <sd...@gn...> - 2000-12-20 18:32:50
|
> * In message <149...@ho...> > * On the subject of "Re: decode-universal-time segfault" > * Sent on Wed, 20 Dec 2000 17:07:56 +0100 (CET) > * Honorable Bruno Haible <ha...@il...> writes: > > Sam writes: > > > > I've put in a workaround now. > > > > I still get the same crash oops - wrong binary! the bug appears to be fixed. Thanks, and sorry about the false alarm... > Does (sys::default-time-zone 1210131) work for you? > Does (sys::default-time-zone 1210132) work for you? [2]> (sys::default-time-zone 1210131) 5 ; NIL [3]> (sys::default-time-zone 1210132) 5 ; NIL > Where's the limit between "works" and "crash"? huh? -- Sam Steingold (http://www.podval.org/~sds) Support Israel's right to defend herself! <http://www.i-charity.com/go/israel> Read what the Arab leaders say to their people on <http://www.memri.org/> There is an exception to every rule, including this one. |
From: Sam S. <sd...@gn...> - 2000-12-21 16:58:02
|
> * In message <149...@ho...> > * On the subject of "Re: segfault" > * Sent on Wed, 20 Dec 2000 16:14:24 +0100 (CET) > * Honorable Bruno Haible <ha...@il...> writes: > > - registry_value_to_object, REG_MULTI_SZ: you introduced a couple of bugs there (+= instead of = and an extra "" at the end of the list). this is very reassuring - others make mistakes too, not just me :-) > I hope you got the idea what needs to be done and can continue on the > second half (everything after MAKE_OBJECT_LIST). I did. I fixed whatever I could. It still crashes (with DK_DEBUG == 0). -- Sam Steingold (http://www.podval.org/~sds) Support Israel's right to defend herself! <http://www.i-charity.com/go/israel> Read what the Arab leaders say to their people on <http://www.memri.org/> The world will end in 5 minutes. Please log out. |
From: Sam S. <sd...@gn...> - 2000-12-21 17:40:39
|
> * In message <149...@ho...> > * On the subject of "Re: segfault" > * Sent on Wed, 20 Dec 2000 16:14:24 +0100 (CET) > * Honorable Bruno Haible <ha...@il...> writes: > > I hope you got the idea what needs to be done and can continue on the > second half (everything after MAKE_OBJECT_LIST). done. thanks for you help! -- Sam Steingold (http://www.podval.org/~sds) Support Israel's right to defend herself! <http://www.i-charity.com/go/israel> Read what the Arab leaders say to their people on <http://www.memri.org/> Binaries die but source code lives forever. |