From: Sam S. <sd...@gn...> - 2004-11-25 21:14:27
|
> * Bruno Haible <oe...@py...t> [2004-11-25 12:51:51 +0100]: > > Sam wrote: >> > Does it work better now? >> >> no, I still get the crash in CVS head: > > And now? still no go: (test-package 100000000) *** - handle_fault error2 ! address = 0x44e004370 not in [0x333987000,0x44373686 0) ! SIGSEGV cannot be cured. Fault address = 0x44e004370. Segmentation fault (same stack as before) > Can you please also check whether a memory image of 7 GB or 10 GB size > can be successfully created and loaded? sure. -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/> <http://www.mideasttruth.com/> <http://www.honestreporting.com> A year spent in artificial intelligence is enough to make one believe in God. |
From: Bruno H. <br...@cl...> - 2004-11-24 21:25:50
|
Sam wrote: > it would be nice if when such a hard limit is hit for the first time, a > warning were issued, similar to the performance warnings for HASH-TABLE > tests & rehashing. It was not a hard limit, as your test results show. Rather, the lookup quality degraded for hash tables with more than 2^24 elements. > > Does it work better now? > > no, I still get the crash in CVS head: > > > Starting program: /home/sds/src/clisp/current/build-O-x86_64/lisp.run -B . -M lispinit.mem -q -norc Tried this on the sourceforge AMD64 machine, which has 8 GB of RAM (and 1 GB of swap space - ridiculous). (test-package 30000000) (room t) (test-package 100000) (room t) (test-package 100000) (room t) ... showed that 1. the total memory size is near 4 GB, and 2. the size of the Varobject heap is also near 4 GB. And when the total memory size crosses 4 GB, and I use the 'time' macro, I get Bytes permanently allocated: 128520 (WRONG! 4 GB missing here) and *** - SYSTEM::DELTA4: negative difference: [15540080 1672] > [9625320 1418] Looking at the stack - #<SYSTEM::SIMPLE-ARITHMETIC-ERROR #x000448909FC8> - 0 - 250000 - 249248 - #<COMPILED-CLOSURE SYSTEM::%TIME> <10> #<COMPILED-CLOSURE SYSTEM::%TIME> - 255 old-gcccount - 15540080 old-space2 - 1672 old-space1 - 140000 old-gc2 - 399 old-gc1 - 430000 old-run2 - 767 old-run1 - 101148 old-real2 - 1057 old-real1 - 255 new-gccount - 9625320 new-space2 - 1418 new-space1 - 140000 new-gc2 - 399 new-gc1 - 680000 new-run2 - 767 new-run1 - 350396 new-real2 <11> #<SPECIAL-OPERATOR MULTIPLE-VALUE-CALL> - 1057 new-real1 - #<COMPILED-CLOSURE SYSTEM::%TIME> EVAL frame for form (MULTIPLE-VALUE-CALL #'SYSTEM::%TIME (SYSTEM::%%TIME) #:G389 #:G390 #:G391 #:G392 #:G393 #:G394 #:G395 #:G396 #:G397) I can see two things: - Your <NN> output lines, generated through the backtrace pointers, are misplaced. (The <11> line should be one line below.) Probably you compare the STACK / FRAME pointers incorrectly. Can you fix that, please? - The erroneous DELTA4 arguments come from the new-space1, new-space2, old-space1, old-space2 variables, which come from the C function used_space(). So the cause of the SYSTEM::DELTA4 error is now found: It's the return value of used_space() which should be 64-bit, not 32-bit, on this platform. And about the crash: My simple guess is that some of the many uintL (32-bit) variables in spvw*.d are not sufficient when the total heap is bigger than 4 GB. Fortunately, this is easier to fix than to make the fixnums 48-bit. Can you confirm, Sam, that allowing a heap bigger than 4 GB is higher priority than having 48-bit fixnums? Bruno |
From: Sam S. <sd...@gn...> - 2004-11-25 17:34:36
|
> * Bruno Haible <oe...@py...t> [2004-11-24 22:25:06 +0100]: > > - Your <NN> output lines, generated through the backtrace pointers, > are misplaced. (The <11> line should be one line below.) Probably > you compare the STACK / FRAME pointers incorrectly. Can you fix > that, please? Does this patch fix the problem? --- debug.d 25 Nov 2004 11:47:15 -0500 1.77 +++ debug.d 25 Nov 2004 12:31:22 -0500 @@ -1462,16 +1462,16 @@ var p_backtrace_t bt = back_trace; while (!eq(FRAME_(0),nullobj) /* nullobj = stack end */ && (frame_limit==0 || count<frame_limit)) { - while (bt_beyond_stack_p(bt,FRAME)) { - print_back_trace(stream_,bt,++count); - bt = bt->bt_next; - } if (frame_up_x != NULL) { var gcv_object_t* next_frame = (*frame_up_x)(FRAME); if (next_frame == FRAME) break; print_stackitem(stream_,FRAME = next_frame); } else FRAME = print_stackitem(stream_,FRAME); + while (bt_beyond_stack_p(bt,FRAME)) { + print_back_trace(stream_,bt,++count); + bt = bt->bt_next; + } } skipSTACK(1); /* drop *STANDARD-OUTPUT* */ return count; > And about the crash: My simple guess is that some of the many uintL > (32-bit) variables in spvw*.d are not sufficient when the total heap > is bigger than 4 GB. Fortunately, this is easier to fix than to make > the fixnums 48-bit. Can you confirm, Sam, that allowing a heap bigger > than 4 GB is higher priority than having 48-bit fixnums? of course! 48-bit fixnums (actually, 56-bit fixnums, but who's counting? :-) are a critical efficiency issue, while 4GB heap limit is a showstopper. Thanks! -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/> <http://www.mideasttruth.com/> <http://www.honestreporting.com> A professor is someone who talks in someone else's sleep. |
From: Bruno H. <br...@cl...> - 2004-11-25 19:08:12
|
Sam wrote: > Does this patch fix the problem? Probably not. It looks mostly like a no-op. Don't you have a test case? It's easy to find one: (defun test () (multiple-value-call #'list 10 20 30 40 50 60 70 (/ 0))) (test) (show-stack) Also (compile 'test) signals an error; this is a serious bug as well, probably in or near the 2001-10-10 patch. If constant folding leads to an error, the compiler must not do the constant folding. Likewise for (defun test () (* 1e30 1e30)) (compile 'test) Bruno |
From: Sam S. <sd...@gn...> - 2004-11-26 20:16:28
|
> * Bruno Haible <oe...@py...t> [2004-11-25 20:07:27 +0100]: > > Sam wrote: >> Does this patch fix the problem? > > Probably not. It looks mostly like a no-op. Don't you have a test case? > It's easy to find one: > > (defun test () (multiple-value-call #'list 10 20 30 40 50 60 70 (/ 0))) > (test) > (show-stack) looks good: [1]> (defun test () (multiple-value-call #'list 10 20 30 40 50 60 70 (/ 0))) TEST [2]> (test) *** - division by zero The following restarts are available: ABORT :R1 ABORT Break 1 [3]> :bt1 <1> #<SYSTEM-FUNCTION SHOW-STACK> <2> #<COMPILED-FUNCTION SYSTEM::DEBUG-BACKTRACE> <3> #<COMPILED-FUNCTION SYSTEM::DEBUG-BACKTRACE-1> <4> #<SYSTEM-FUNCTION SYSTEM::READ-EVAL-PRINT> <5> #<COMPILED-FUNCTION SYSTEM::BREAK-LOOP-2-2> <6> #<SYSTEM-FUNCTION SYSTEM::SAME-ENV-AS> <7> #<COMPILED-FUNCTION SYSTEM::BREAK-LOOP-2> <8> #<SYSTEM-FUNCTION SYSTEM::DRIVER> <9> #<COMPILED-FUNCTION SYSTEM::BREAK-LOOP> - #<SYSTEM::SIMPLE-DIVISION-BY-ZERO #x102F8225> - NIL - #<SYSTEM::SIMPLE-DIVISION-BY-ZERO #x102F8225> <10> #<SYSTEM-FUNCTION INVOKE-DEBUGGER> frame binding variables (~ = dynamically): | ~ SYSTEM::*PRIN-STREAM* <--> #<UNBOUND> frame binding variables (~ = dynamically): | ~ *PRINT-READABLY* <--> NIL frame binding variables (~ = dynamically): | ~ *PRINT-ESCAPE* <--> T - #<SYSTEM::SIMPLE-DIVISION-BY-ZERO #x102F8225> - 0 <11> #<SYSTEM-FUNCTION /> 0 EVAL frame for form (/ 0) - NIL - 70 - 60 - 50 - 40 - 30 - 20 - 10 <12> #<SPECIAL-OPERATOR MULTIPLE-VALUE-CALL> - #<SYSTEM-FUNCTION LIST> EVAL frame for form (MULTIPLE-VALUE-CALL #'LIST 10 20 30 40 50 60 70 (/ 0)) - NIL frame binding environments VAR_ENV <--> NIL FUN_ENV <--> NIL BLOCK_ENV <--> NIL GO_ENV <--> NIL DECL_ENV <--> ((DECLARATION OPTIMIZE DECLARATION)) frame binding variables #<ADDRESS #x101004CC> binds (~ = dynamically): Next environment: NIL APPLY frame for call (TEST) - #<FUNCTION TEST NIL (DECLARE (SYSTEM::IN-DEFUN TEST)) (BLOCK TEST (MULTIPLE-VALUE-CALL #'LIST 10 20 30 40 50 60 70 (/ 0)))> <13> #<FUNCTION TEST NIL (DECLARE (SYSTEM::IN-DEFUN TEST)) (BLOCK TEST (MULTIPLE-VALUE-CALL #'LIST 10 20 30 40 50 60 70 (/ 0)))> 0 EVAL frame for form (TEST) - #<IO TERMINAL-STREAM> <14> #<SYSTEM-FUNCTION SYSTEM::READ-EVAL-PRINT> - #<IO TERMINAL-STREAM> frame binding variables (~ = dynamically): | ~ SYSTEM::*ACTIVE-RESTARTS* <--> NIL compiled tagbody frame for #(NIL) - 87 - #(NIL NIL) Printed 14 frames Break 1 [3]> > Also (compile 'test) signals an error; this is a serious bug as well, > probably in or near the 2001-10-10 patch. If constant folding leads to > an error, the compiler must not do the constant folding. Likewise for > > (defun test () (* 1e30 1e30)) > (compile 'test) I checked in a fix for that: [3]> (defun test-constant-folding () (* 1e30 1e30)) TEST-CONSTANT-FOLDING [4]> (compile *) WARNING in TEST-CONSTANT-FOLDING : Run time error expected: floating point overflow WARNING in TEST-CONSTANT-FOLDING : Run time error expected: floating point overflow TEST-CONSTANT-FOLDING ; 2 ; 2 why is the error reported twice?! -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/> <http://www.mideasttruth.com/> <http://www.honestreporting.com> My other CAR is a CDR. |