From: Larry V. <re...@us...> - 2009-02-23 20:49:10
Attachments:
aver.txt
stepping.txt
|
Two hppa fixes attached: * stepping bugfix -- symbol-value possibly loaded wrongly on hppa, sbcl-1.0.25.55 sigbus, 1.0.25 doesnt. It sigbusses at point call.lisp:multiple-call-named with (debug 3) enabled that generates symbol-value loading of SB!IMPL::*STEPPING*. If the bugfix is valid, mips should be affected too. * aver-cleanup -- a patch previously sent but not committed (proper checking of instruction constants limits plus cleanups suggested by Nathan Froyd). Details stepping bugfix / hppa-debugging-mini-primer: Under gdb it is shown what is generated for loading an symbol-value compiled to an instruction as: (inst ldw (- (+ symbol-value-slot (truncate (static-symbol-offset 'sb!impl::*stepping*) n-word-bytes)) other-pointer-lowtag) null-tn stepping) => 0x60006b40: ldw 73(r6),r15 r6 = null-tn = 0x4e00000b 0x73 + $r6 = 0x4e00007e now 0x4e00007e is an address that isn't loadable (would hit sigbus) because it isn't word-aligned. The code that triggered this bug was in sb-introspect: >>> (declaim (optimize (debug 3))) (with-compilation-unit nil (eval '(defun four () 4))) <<< Compiling the above code using sbcl-1.0.25.55 would generate a fasl that contained the sigbussing instruction. During compilation of above file, the vop multiple-call-named is used. And because of (debug 3) it will enter the WHEN clause in that vop which contains this LDW instruction that sigbus during compilation. The strange thing is that 1.0.25 works fine (above test + all contrib compiles). If we compile the above code "(compile-file "t" :trace-file t) and looks in the trace file we will find: VOP MULTIPLE-CALL-NAMED t24[NL1] t28[Const7] t27[A0] ... LDW 109, #<TN t49[NULL]>, #<TN t56[A1]> COMB =, #<TN t49[NULL]>, #<TN t49[NULL]>, L17, NULLIFY, T ... Now NULL is the r6 register that contains as before 0x4e00000b. and (+ #x4e00000b 109) => #4E000078 which is an loadable address (word-aligned). My only guess is that the sb!impl::*stepping* symbol has changed place (in the heap) because of other unrelated patches and it miscalculates an address that just happens to be word-aligned but has never pointed to the symbol-value of the wanted symbol. So is sbcl-1.0.25 loading the symbol-value from the wrong address, lets find out. Gdb is entered and sbcl-1.0.25 loads up, then we hit break to enter gdb again. The trace file said offset 109 plus register r6. Lets look at the memory around that region (address 0x4e000078 == $r6 + 109): (gdb) x/4xw 0x4e00000b + 109 0x4e000078: 0x4c079407 0x4e000aa9 0x0000053e 0x4e362598 (gdb) x/40xw 0x4e00000b + 109 - 20 0x4e000064: 0x4e0007d1 z 0x0000053e 0x4c4a96d8 0x0dd395b0 0x4e000074: 0x4e00000b x 0x4c079407 0x4e000aa9 0x0000053e 0x4e000084: 0x4e362598 0x48da8650 0x4e00000b 0x4c07942f 0x4e000094: 0x4e000aa9 0x0000053e 0x7aa74490 0x3dd2d724 x = address LDW tries to load from z = symbol structure widetag. According to objdef.lisp, LDW is loading the symbol-name, which is: (gdb) x/4xs 0x4c079407 0x4c079407: "x*READ-ONLY-SPACE-FREE-POINTER*" To conclude the LDW is loading from the wrong symbol and also at the wrong slot, it is rather random from sbcl-version to sbcl-version. If analysis is correct this stepping patch should also be applied to mips (because we then are loading the symbol-value from the wrong address there too). The code currently looks the same between mips and hppa. To verify this on mips I would like to check sbcl 1.0.25 under mips for what address the load instruction uses: compile-file the above code to see if the LD instruction generated points to the symbol-value-slot of the *stepping* symbol structure. I'll launch qemu when I have time to compile 1.0.25 or if anyone want to check this. best regards, /larry |
From: Gábor M. <me...@re...> - 2009-02-28 22:20:32
|
On Lunes 23 Febrero 2009, Larry Valkama wrote: > Two hppa fixes attached: > > * stepping bugfix -- symbol-value possibly loaded wrongly > on hppa, sbcl-1.0.25.55 sigbus, 1.0.25 doesnt. > It sigbusses at point call.lisp:multiple-call-named with > (debug 3) enabled that generates symbol-value loading of > SB!IMPL::*STEPPING*. > If the bugfix is valid, mips should be affected too. > > * aver-cleanup -- a patch previously sent but not committed (proper > checking of instruction constants limits plus cleanups suggested by > Nathan Froyd). > > > Details stepping bugfix / hppa-debugging-mini-primer: > > Under gdb it is shown what is generated for loading an symbol-value > compiled to an instruction as: > > (inst ldw (- (+ symbol-value-slot > (truncate (static-symbol-offset 'sb!impl::*stepping*) > n-word-bytes)) > other-pointer-lowtag) > null-tn stepping) > > => > > 0x60006b40: ldw 73(r6),r15 > > r6 = null-tn = 0x4e00000b > 0x73 + $r6 = 0x4e00007e > > now 0x4e00007e is an address that isn't loadable (would hit sigbus) > because it isn't word-aligned. > > The code that triggered this bug was in sb-introspect: > > > (declaim (optimize (debug 3))) > > (with-compilation-unit nil > (eval '(defun four () 4))) > <<< > > Compiling the above code using sbcl-1.0.25.55 would generate a fasl > that contained the sigbussing instruction. > > During compilation of above file, the vop multiple-call-named is > used. And because of (debug 3) it will enter the WHEN clause in that > vop which contains this LDW instruction that sigbus during > compilation. > > The strange thing is that 1.0.25 works fine (above test + all contrib > compiles). If we compile the above code "(compile-file "t" > :trace-file t) and looks in the trace file we will find: > > VOP MULTIPLE-CALL-NAMED t24[NL1] t28[Const7] t27[A0] > ... > LDW 109, #<TN t49[NULL]>, #<TN t56[A1]> > COMB =, #<TN t49[NULL]>, #<TN t49[NULL]>, L17, NULLIFY, T > ... > > Now NULL is the r6 register that contains as before 0x4e00000b. > and (+ #x4e00000b 109) => #4E000078 > which is an loadable address (word-aligned). > > My only guess is that the sb!impl::*stepping* symbol has changed > place (in the heap) because of other unrelated patches and it > miscalculates an address that just happens to be word-aligned but has > never pointed to the symbol-value of the wanted symbol. > > So is sbcl-1.0.25 loading the symbol-value from the wrong address, > lets find out. Gdb is entered and sbcl-1.0.25 loads up, then we hit > break to enter gdb again. The trace file said offset 109 plus > register r6. Lets look at the memory around that region (address > 0x4e000078 == $r6 + 109): > > (gdb) x/4xw 0x4e00000b + 109 > 0x4e000078: 0x4c079407 0x4e000aa9 0x0000053e > 0x4e362598 > > (gdb) x/40xw 0x4e00000b + 109 - 20 > 0x4e000064: 0x4e0007d1 z 0x0000053e 0x4c4a96d8 0x0dd395b0 > 0x4e000074: 0x4e00000b x 0x4c079407 0x4e000aa9 0x0000053e > 0x4e000084: 0x4e362598 0x48da8650 0x4e00000b 0x4c07942f > 0x4e000094: 0x4e000aa9 0x0000053e 0x7aa74490 0x3dd2d724 > > x = address LDW tries to load from > z = symbol structure widetag. > According to objdef.lisp, LDW is loading the symbol-name, which is: > (gdb) x/4xs 0x4c079407 > 0x4c079407: "x*READ-ONLY-SPACE-FREE-POINTER*" > > To conclude the LDW is loading from the wrong symbol and also at the > wrong slot, it is rather random from sbcl-version to sbcl-version. > > If analysis is correct this stepping patch should also be applied to > mips (because we then are loading the symbol-value from the wrong > address there too). > The code currently looks the same between mips and hppa. > > To verify this on mips I would like to check sbcl 1.0.25 under mips > for what address the load instruction uses: compile-file the above > code to see if the LD instruction generated points to the > symbol-value-slot of the *stepping* symbol structure. I'll launch > qemu when I have time to compile 1.0.25 or if anyone want to check > this. > > best regards, > /larry I haven't had the time to check the correctness on mips yet, but 1.0.25.56 still builds. Am I missing something or should load-symbol-value be used here? |
From: Gábor M. <me...@re...> - 2009-03-02 13:26:51
|
On Sábado 28 Febrero 2009, Gábor Melis wrote: > On Lunes 23 Febrero 2009, Larry Valkama wrote: > > Two hppa fixes attached: > > > > * stepping bugfix -- symbol-value possibly loaded wrongly > > on hppa, sbcl-1.0.25.55 sigbus, 1.0.25 doesnt. > > It sigbusses at point call.lisp:multiple-call-named with > > (debug 3) enabled that generates symbol-value loading of > > SB!IMPL::*STEPPING*. > > If the bugfix is valid, mips should be affected too. > > > > * aver-cleanup -- a patch previously sent but not committed (proper > > checking of instruction constants limits plus cleanups suggested > > by Nathan Froyd). > > > > > > Details stepping bugfix / hppa-debugging-mini-primer: > > > > Under gdb it is shown what is generated for loading an symbol-value > > compiled to an instruction as: > > > > (inst ldw (- (+ symbol-value-slot > > (truncate (static-symbol-offset > > 'sb!impl::*stepping*) n-word-bytes)) > > other-pointer-lowtag) > > null-tn stepping) > > > > => > > > > 0x60006b40: ldw 73(r6),r15 > > > > r6 = null-tn = 0x4e00000b > > 0x73 + $r6 = 0x4e00007e > > > > now 0x4e00007e is an address that isn't loadable (would hit sigbus) > > because it isn't word-aligned. > > > > The code that triggered this bug was in sb-introspect: > > > > > > (declaim (optimize (debug 3))) > > > > (with-compilation-unit nil > > (eval '(defun four () 4))) > > <<< > > > > Compiling the above code using sbcl-1.0.25.55 would generate a fasl > > that contained the sigbussing instruction. > > > > During compilation of above file, the vop multiple-call-named is > > used. And because of (debug 3) it will enter the WHEN clause in > > that vop which contains this LDW instruction that sigbus during > > compilation. > > > > The strange thing is that 1.0.25 works fine (above test + all > > contrib compiles). If we compile the above code "(compile-file "t" > > > > :trace-file t) and looks in the trace file we will find: > > > > VOP MULTIPLE-CALL-NAMED t24[NL1] t28[Const7] t27[A0] > > ... > > LDW 109, #<TN t49[NULL]>, #<TN t56[A1]> > > COMB =, #<TN t49[NULL]>, #<TN t49[NULL]>, L17, NULLIFY, T > > ... > > > > Now NULL is the r6 register that contains as before 0x4e00000b. > > and (+ #x4e00000b 109) => #4E000078 > > which is an loadable address (word-aligned). > > > > My only guess is that the sb!impl::*stepping* symbol has changed > > place (in the heap) because of other unrelated patches and it > > miscalculates an address that just happens to be word-aligned but > > has never pointed to the symbol-value of the wanted symbol. > > > > So is sbcl-1.0.25 loading the symbol-value from the wrong address, > > lets find out. Gdb is entered and sbcl-1.0.25 loads up, then we hit > > break to enter gdb again. The trace file said offset 109 plus > > register r6. Lets look at the memory around that region (address > > 0x4e000078 == $r6 + 109): > > > > (gdb) x/4xw 0x4e00000b + 109 > > 0x4e000078: 0x4c079407 0x4e000aa9 0x0000053e > > 0x4e362598 > > > > (gdb) x/40xw 0x4e00000b + 109 - 20 > > 0x4e000064: 0x4e0007d1 z 0x0000053e 0x4c4a96d8 0x0dd395b0 > > 0x4e000074: 0x4e00000b x 0x4c079407 0x4e000aa9 0x0000053e > > 0x4e000084: 0x4e362598 0x48da8650 0x4e00000b 0x4c07942f > > 0x4e000094: 0x4e000aa9 0x0000053e 0x7aa74490 0x3dd2d724 > > > > x = address LDW tries to load from > > z = symbol structure widetag. > > According to objdef.lisp, LDW is loading the symbol-name, which is: > > (gdb) x/4xs 0x4c079407 > > 0x4c079407: "x*READ-ONLY-SPACE-FREE-POINTER*" > > > > To conclude the LDW is loading from the wrong symbol and also at > > the wrong slot, it is rather random from sbcl-version to > > sbcl-version. > > > > If analysis is correct this stepping patch should also be applied > > to mips (because we then are loading the symbol-value from the > > wrong address there too). > > The code currently looks the same between mips and hppa. > > > > To verify this on mips I would like to check sbcl 1.0.25 under mips > > for what address the load instruction uses: compile-file the above > > code to see if the LD instruction generated points to the > > symbol-value-slot of the *stepping* symbol structure. I'll launch > > qemu when I have time to compile 1.0.25 or if anyone want to check > > this. > > > > best regards, > > /larry > > I haven't had the time to check the correctness on mips yet, but > 1.0.25.56 still builds. > > Am I missing something or should load-symbol-value be used here? I have checked that indeed on mips too the address is miscalculated: * (defun foo () (declare (optimize (debug 3))) (print 'hej) (print 'ho)) FOO * (- (sb-vm::get-lisp-obj-address 'sb-impl::*stepping*) (sb-vm::get-lisp-obj-address 'nil)) 460 * (disassemble 'foo) ; disassembly for FOO ... ;;; [4] (SB-INT:NAMED-LAMBDA FOO NIL (DECLARE (OPTIMIZE #)) ...) ... ; 630: 6D00898E LW $A1, $NULL[109] ; 634: 02003411 BEQ $NULL, $A1, L0 Checking genesis/static-symbols.h also reveals that $NULL[109] points somewhere into read-only-space-free-pointer. But I've no idea why, despite this, single stepping works?! * (sb-impl::with-stepping-enabled (foo)) ; Evaluating call: ; (PRINT 'HEJ) ; With arguments: ; HEJ 1] And it remains working if I just do (load-symbol-value stepping sb!impl::*stepping*) ... |
From: Larry V. <re...@us...> - 2009-03-02 18:25:19
|
Gábor Melis skrev: > On Lunes 23 Febrero 2009, Larry Valkama wrote: >> Two hppa fixes attached: >> >> * stepping bugfix -- symbol-value possibly loaded wrongly >> on hppa, sbcl-1.0.25.55 sigbus, 1.0.25 doesnt. >> It sigbusses at point call.lisp:multiple-call-named with >> (debug 3) enabled that generates symbol-value loading of >> SB!IMPL::*STEPPING*. >> If the bugfix is valid, mips should be affected too. >> >> * aver-cleanup -- a patch previously sent but not committed (proper >> checking of instruction constants limits plus cleanups suggested by >> Nathan Froyd). >> >> >> Details stepping bugfix / hppa-debugging-mini-primer: >> >> Under gdb it is shown what is generated for loading an symbol-value >> compiled to an instruction as: >> >> (inst ldw (- (+ symbol-value-slot >> (truncate (static-symbol-offset 'sb!impl::*stepping*) >> n-word-bytes)) >> other-pointer-lowtag) >> null-tn stepping) >> >> => >> >> 0x60006b40: ldw 73(r6),r15 >> >> r6 = null-tn = 0x4e00000b >> 0x73 + $r6 = 0x4e00007e >> >> now 0x4e00007e is an address that isn't loadable (would hit sigbus) >> because it isn't word-aligned. >> >> The code that triggered this bug was in sb-introspect: >> >> >> (declaim (optimize (debug 3))) >> >> (with-compilation-unit nil >> (eval '(defun four () 4))) >> <<< >> >> Compiling the above code using sbcl-1.0.25.55 would generate a fasl >> that contained the sigbussing instruction. >> >> During compilation of above file, the vop multiple-call-named is >> used. And because of (debug 3) it will enter the WHEN clause in that >> vop which contains this LDW instruction that sigbus during >> compilation. >> >> The strange thing is that 1.0.25 works fine (above test + all contrib >> compiles). If we compile the above code "(compile-file "t" >> :trace-file t) and looks in the trace file we will find: >> >> VOP MULTIPLE-CALL-NAMED t24[NL1] t28[Const7] t27[A0] >> ... >> LDW 109, #<TN t49[NULL]>, #<TN t56[A1]> >> COMB =, #<TN t49[NULL]>, #<TN t49[NULL]>, L17, NULLIFY, T >> ... >> >> Now NULL is the r6 register that contains as before 0x4e00000b. >> and (+ #x4e00000b 109) => #4E000078 >> which is an loadable address (word-aligned). >> >> My only guess is that the sb!impl::*stepping* symbol has changed >> place (in the heap) because of other unrelated patches and it >> miscalculates an address that just happens to be word-aligned but has >> never pointed to the symbol-value of the wanted symbol. >> >> So is sbcl-1.0.25 loading the symbol-value from the wrong address, >> lets find out. Gdb is entered and sbcl-1.0.25 loads up, then we hit >> break to enter gdb again. The trace file said offset 109 plus >> register r6. Lets look at the memory around that region (address >> 0x4e000078 == $r6 + 109): >> >> (gdb) x/4xw 0x4e00000b + 109 >> 0x4e000078: 0x4c079407 0x4e000aa9 0x0000053e >> 0x4e362598 >> >> (gdb) x/40xw 0x4e00000b + 109 - 20 >> 0x4e000064: 0x4e0007d1 z 0x0000053e 0x4c4a96d8 0x0dd395b0 >> 0x4e000074: 0x4e00000b x 0x4c079407 0x4e000aa9 0x0000053e >> 0x4e000084: 0x4e362598 0x48da8650 0x4e00000b 0x4c07942f >> 0x4e000094: 0x4e000aa9 0x0000053e 0x7aa74490 0x3dd2d724 >> >> x = address LDW tries to load from >> z = symbol structure widetag. >> According to objdef.lisp, LDW is loading the symbol-name, which is: >> (gdb) x/4xs 0x4c079407 >> 0x4c079407: "x*READ-ONLY-SPACE-FREE-POINTER*" >> >> To conclude the LDW is loading from the wrong symbol and also at the >> wrong slot, it is rather random from sbcl-version to sbcl-version. >> >> If analysis is correct this stepping patch should also be applied to >> mips (because we then are loading the symbol-value from the wrong >> address there too). >> The code currently looks the same between mips and hppa. >> >> To verify this on mips I would like to check sbcl 1.0.25 under mips >> for what address the load instruction uses: compile-file the above >> code to see if the LD instruction generated points to the >> symbol-value-slot of the *stepping* symbol structure. I'll launch >> qemu when I have time to compile 1.0.25 or if anyone want to check >> this. >> >> best regards, >> /larry > > I haven't had the time to check the correctness on mips yet, but > 1.0.25.56 still builds. > > Am I missing something or should load-symbol-value be used here? > Not sure, load-symbol-value is defined in hppa/macros.lisp and is used by nlx.lisp. A check showed that that too also goes wrong: compilation of nlx.lisp: ... 0x514a3900: break a,0 0x514a3904: # 71ffec8 0x514a3908: # 1fee801 0x514a390c: stw r13,4(r4) 0x514a3910: copy r5,r26 0x514a3914: stw r26,3c(r4) r6=0x4e00000b , $r6+91=0x4e000066 VOP save-dynamic-state 0x514a3918: ldw 91(r6),r14 r14=0 0x514a391c: break a,0 <--- sigill's, inserted in define-vop to hook up gdb 0x514a3920: copy sp,r16 (end of vop, returning in register r14) 0x514a3924: stw r14,38(r4) 0x514a3928: stw r15,14(r4) 0x514a392c: stw r16,10(r4) 0x514a3930: copy r3,r26 0x514a3934: stw r26,40(r4) 0x514a3938: addi 20,r4,r25 0x514a393c: ldw a9(r6),r14 0x514a3940: break b,0 ... 0x4e000020: 0x0000053e 0x4e000027 0x79abf308 0x4e00000b 0x4e000030: 0x50000017 0x4e00000b 0x0000053e 0x0000004a 0x4e000040: 0x00000000 0x4e00000b 0x50000027 0x4e00000b 0x4e000050: w 0x0000053e v 0x0000004a h 0x00000000 p 0x4e00000b 0x4e000060: n 0x5000003f pa 0x4e00000b 0x0000053e 0x4b0005b0 0x4e000070: 0x00000000 0x4e00000b 0x50000057 0x4e00000b 0x4e000080: 0x0000053e 0x4e000530 0x00000000 0x4e00000b 0x4e000090: 0x5000007f 0x4e00000b 0x0000053e 0x00000000 w = widetag, 53e: symbol structure v = value h = hash p = plist n = name: x/4xs 0x5000003f => 0x5000003f: "4*CORE-STRING*" pa=package We load return value (reg r14) with 0x4e000066 that points to the wrong symbol and slot (should be symbol *current-catch-block* and value slot). It seems ppc, sparc, mips, alpha and hppa all computes load-symbol-value the same way.. can they really all be wrong ? sound more that I'm wrong then. But again hppa above showes clearly that hppa does it wrongly. best regards, /larry |
From: Larry V. <re...@us...> - 2009-03-07 06:32:20
|
> Gábor Melis skrev: >> On Lunes 23 Febrero 2009, Larry Valkama wrote: >>> Two hppa fixes attached: >>> >>> * stepping bugfix -- symbol-value possibly loaded wrongly >>> on hppa, sbcl-1.0.25.55 sigbus, 1.0.25 doesnt. >>> It sigbusses at point call.lisp:multiple-call-named with >>> (debug 3) enabled that generates symbol-value loading of >>> SB!IMPL::*STEPPING*. >>> If the bugfix is valid, mips should be affected too. >>> >>> * aver-cleanup -- a patch previously sent but not committed (proper >>> checking of instruction constants limits plus cleanups suggested by >>> Nathan Froyd). >>> >>> >>> Details stepping bugfix / hppa-debugging-mini-primer: >>> >>> Under gdb it is shown what is generated for loading an symbol-value >>> compiled to an instruction as: >>> >>> (inst ldw (- (+ symbol-value-slot >>> (truncate (static-symbol-offset 'sb!impl::*stepping*) >>> n-word-bytes)) >>> other-pointer-lowtag) >>> null-tn stepping) >>> >>> => >>> >>> 0x60006b40: ldw 73(r6),r15 >>> >>> r6 = null-tn = 0x4e00000b >>> 0x73 + $r6 = 0x4e00007e >>> >>> now 0x4e00007e is an address that isn't loadable (would hit sigbus) >>> because it isn't word-aligned. >>> >>> The code that triggered this bug was in sb-introspect: >>> >>> >>> (declaim (optimize (debug 3))) >>> >>> (with-compilation-unit nil >>> (eval '(defun four () 4))) >>> <<< >>> >>> Compiling the above code using sbcl-1.0.25.55 would generate a fasl >>> that contained the sigbussing instruction. >>> >>> During compilation of above file, the vop multiple-call-named is >>> used. And because of (debug 3) it will enter the WHEN clause in that >>> vop which contains this LDW instruction that sigbus during >>> compilation. >>> >>> The strange thing is that 1.0.25 works fine (above test + all contrib >>> compiles). If we compile the above code "(compile-file "t" >>> :trace-file t) and looks in the trace file we will find: >>> >>> VOP MULTIPLE-CALL-NAMED t24[NL1] t28[Const7] t27[A0] >>> ... >>> LDW 109, #<TN t49[NULL]>, #<TN t56[A1]> >>> COMB =, #<TN t49[NULL]>, #<TN t49[NULL]>, L17, NULLIFY, T >>> ... >>> >>> Now NULL is the r6 register that contains as before 0x4e00000b. >>> and (+ #x4e00000b 109) => #4E000078 >>> which is an loadable address (word-aligned). >>> >>> My only guess is that the sb!impl::*stepping* symbol has changed >>> place (in the heap) because of other unrelated patches and it >>> miscalculates an address that just happens to be word-aligned but has >>> never pointed to the symbol-value of the wanted symbol. >>> >>> So is sbcl-1.0.25 loading the symbol-value from the wrong address, >>> lets find out. Gdb is entered and sbcl-1.0.25 loads up, then we hit >>> break to enter gdb again. The trace file said offset 109 plus >>> register r6. Lets look at the memory around that region (address >>> 0x4e000078 == $r6 + 109): >>> >>> (gdb) x/4xw 0x4e00000b + 109 >>> 0x4e000078: 0x4c079407 0x4e000aa9 0x0000053e >>> 0x4e362598 >>> >>> (gdb) x/40xw 0x4e00000b + 109 - 20 >>> 0x4e000064: 0x4e0007d1 z 0x0000053e 0x4c4a96d8 0x0dd395b0 >>> 0x4e000074: 0x4e00000b x 0x4c079407 0x4e000aa9 0x0000053e >>> 0x4e000084: 0x4e362598 0x48da8650 0x4e00000b 0x4c07942f >>> 0x4e000094: 0x4e000aa9 0x0000053e 0x7aa74490 0x3dd2d724 >>> >>> x = address LDW tries to load from >>> z = symbol structure widetag. >>> According to objdef.lisp, LDW is loading the symbol-name, which is: >>> (gdb) x/4xs 0x4c079407 >>> 0x4c079407: "x*READ-ONLY-SPACE-FREE-POINTER*" >>> >>> To conclude the LDW is loading from the wrong symbol and also at the >>> wrong slot, it is rather random from sbcl-version to sbcl-version. >>> >>> If analysis is correct this stepping patch should also be applied to >>> mips (because we then are loading the symbol-value from the wrong >>> address there too). >>> The code currently looks the same between mips and hppa. >>> >>> To verify this on mips I would like to check sbcl 1.0.25 under mips >>> for what address the load instruction uses: compile-file the above >>> code to see if the LD instruction generated points to the >>> symbol-value-slot of the *stepping* symbol structure. I'll launch >>> qemu when I have time to compile 1.0.25 or if anyone want to check >>> this. >>> >>> best regards, >>> /larry >> I haven't had the time to check the correctness on mips yet, but >> 1.0.25.56 still builds. >> >> Am I missing something or should load-symbol-value be used here? >> > > Not sure, load-symbol-value is defined in hppa/macros.lisp and is used > by nlx.lisp. > > A check showed that that too also goes wrong: > > compilation of nlx.lisp: > ... > 0x514a3900: break a,0 > 0x514a3904: # 71ffec8 > 0x514a3908: # 1fee801 > 0x514a390c: stw r13,4(r4) > 0x514a3910: copy r5,r26 > 0x514a3914: stw r26,3c(r4) r6=0x4e00000b , $r6+91=0x4e000066 > VOP save-dynamic-state > 0x514a3918: ldw 91(r6),r14 r14=0 > 0x514a391c: break a,0 <--- sigill's, inserted in define-vop to > hook up gdb > 0x514a3920: copy sp,r16 > (end of vop, returning in register r14) > 0x514a3924: stw r14,38(r4) > 0x514a3928: stw r15,14(r4) > 0x514a392c: stw r16,10(r4) > 0x514a3930: copy r3,r26 > 0x514a3934: stw r26,40(r4) > 0x514a3938: addi 20,r4,r25 > 0x514a393c: ldw a9(r6),r14 > 0x514a3940: break b,0 > ... > > 0x4e000020: 0x0000053e 0x4e000027 0x79abf308 0x4e00000b > 0x4e000030: 0x50000017 0x4e00000b 0x0000053e 0x0000004a > 0x4e000040: 0x00000000 0x4e00000b 0x50000027 0x4e00000b > 0x4e000050: w 0x0000053e v 0x0000004a h 0x00000000 p 0x4e00000b > 0x4e000060: n 0x5000003f pa 0x4e00000b 0x0000053e 0x4b0005b0 > 0x4e000070: 0x00000000 0x4e00000b 0x50000057 0x4e00000b > 0x4e000080: 0x0000053e 0x4e000530 0x00000000 0x4e00000b > 0x4e000090: 0x5000007f 0x4e00000b 0x0000053e 0x00000000 > > w = widetag, 53e: symbol structure > v = value > h = hash > p = plist > n = name: x/4xs 0x5000003f => 0x5000003f: "4*CORE-STRING*" > pa=package > > We load return value (reg r14) with 0x4e000066 that points to the wrong > symbol and slot (should be symbol *current-catch-block* and value slot). > > It seems ppc, sparc, mips, alpha and hppa all computes load-symbol-value > the same way.. can they really all be wrong ? sound more that I'm wrong > then. But again hppa above showes clearly that hppa does it wrongly. > > best regards, > /larry > A new patch that fixes the bugs in nlx.lisp and cleansup call.lisp regz, /larry diff --git a/src/compiler/hppa/call.lisp b/src/compiler/hppa/call.lisp index 1edc572..db0771e 100644 --- a/src/compiler/hppa/call.lisp +++ b/src/compiler/hppa/call.lisp @@ -774,13 +774,8 @@ default-value-8 (insert-step-instrumenting (callable-tn) ;; Conditionally insert a conditional trap: (when step-instrumenting - ;; Get the symbol-value of SB!IMPL::*STEPPING* - (loadw stepping null-tn - (+ symbol-value-slot - (truncate (static-symbol-offset 'sb!impl::*stepping*) - n-word-bytes)) - other-pointer-lowtag) - ;; If it's not NIL, trap. + (load-symbol-value stepping sb!impl::*stepping*) + ;; If symbol-value is not NIL, trap. ;(inst comb := stepping null-tn step-done-label) (inst comb := null-tn null-tn step-done-label :nullify t) ;; CONTEXT-PC will be pointing here when the diff --git a/src/compiler/hppa/macros.lisp b/src/compiler/hppa/macros.lisp index 0a5e991..6349953 100644 --- a/src/compiler/hppa/macros.lisp +++ b/src/compiler/hppa/macros.lisp @@ -50,11 +50,11 @@ (inst ldo offset null-tn ,reg :unsigned t)))))) (defmacro load-symbol-value (reg symbol) - `(inst ldw - (+ (static-symbol-offset ',symbol) - (ash symbol-value-slot word-shift) - (- other-pointer-lowtag)) - null-tn ,reg)) + `(loadw ,reg null-tn + (+ symbol-value-slot + (truncate (static-symbol-offset ',symbol) + n-word-bytes)) + other-pointer-lowtag)) (defmacro store-symbol-value (reg symbol) `(inst stw ,reg (+ (static-symbol-offset ',symbol) |
From: Gábor M. <me...@re...> - 2009-03-09 22:05:20
|
On Sábado 07 Marzo 2009, Larry Valkama wrote: > > Gábor Melis skrev: > >> On Lunes 23 Febrero 2009, Larry Valkama wrote: > >>> Two hppa fixes attached: > >>> > >>> * stepping bugfix -- symbol-value possibly loaded wrongly > >>> on hppa, sbcl-1.0.25.55 sigbus, 1.0.25 doesnt. > >>> It sigbusses at point call.lisp:multiple-call-named with > >>> (debug 3) enabled that generates symbol-value loading of > >>> SB!IMPL::*STEPPING*. > >>> If the bugfix is valid, mips should be affected too. > >>> > >>> * aver-cleanup -- a patch previously sent but not committed > >>> (proper checking of instruction constants limits plus cleanups > >>> suggested by Nathan Froyd). > >>> > >>> > >>> Details stepping bugfix / hppa-debugging-mini-primer: > >>> > >>> Under gdb it is shown what is generated for loading an > >>> symbol-value compiled to an instruction as: > >>> > >>> (inst ldw (- (+ symbol-value-slot > >>> (truncate (static-symbol-offset > >>> 'sb!impl::*stepping*) n-word-bytes)) > >>> other-pointer-lowtag) > >>> null-tn stepping) > >>> > >>> => > >>> > >>> 0x60006b40: ldw 73(r6),r15 > >>> > >>> r6 = null-tn = 0x4e00000b > >>> 0x73 + $r6 = 0x4e00007e > >>> > >>> now 0x4e00007e is an address that isn't loadable (would hit > >>> sigbus) because it isn't word-aligned. > >>> > >>> The code that triggered this bug was in sb-introspect: > >>> > >>> > >>> (declaim (optimize (debug 3))) > >>> > >>> (with-compilation-unit nil > >>> (eval '(defun four () 4))) > >>> <<< > >>> > >>> Compiling the above code using sbcl-1.0.25.55 would generate a > >>> fasl that contained the sigbussing instruction. > >>> > >>> During compilation of above file, the vop multiple-call-named is > >>> used. And because of (debug 3) it will enter the WHEN clause in > >>> that vop which contains this LDW instruction that sigbus during > >>> compilation. > >>> > >>> The strange thing is that 1.0.25 works fine (above test + all > >>> contrib compiles). If we compile the above code "(compile-file > >>> "t" > >>> > >>> :trace-file t) and looks in the trace file we will find: > >>> > >>> VOP MULTIPLE-CALL-NAMED t24[NL1] t28[Const7] t27[A0] > >>> ... > >>> LDW 109, #<TN t49[NULL]>, #<TN t56[A1]> > >>> COMB =, #<TN t49[NULL]>, #<TN t49[NULL]>, L17, NULLIFY, T > >>> ... > >>> > >>> Now NULL is the r6 register that contains as before 0x4e00000b. > >>> and (+ #x4e00000b 109) => #4E000078 > >>> which is an loadable address (word-aligned). > >>> > >>> My only guess is that the sb!impl::*stepping* symbol has changed > >>> place (in the heap) because of other unrelated patches and it > >>> miscalculates an address that just happens to be word-aligned but > >>> has never pointed to the symbol-value of the wanted symbol. > >>> > >>> So is sbcl-1.0.25 loading the symbol-value from the wrong > >>> address, lets find out. Gdb is entered and sbcl-1.0.25 loads up, > >>> then we hit break to enter gdb again. The trace file said offset > >>> 109 plus register r6. Lets look at the memory around that region > >>> (address 0x4e000078 == $r6 + 109): > >>> > >>> (gdb) x/4xw 0x4e00000b + 109 > >>> 0x4e000078: 0x4c079407 0x4e000aa9 0x0000053e > >>> 0x4e362598 > >>> > >>> (gdb) x/40xw 0x4e00000b + 109 - 20 > >>> 0x4e000064: 0x4e0007d1 z 0x0000053e 0x4c4a96d8 0x0dd395b0 > >>> 0x4e000074: 0x4e00000b x 0x4c079407 0x4e000aa9 0x0000053e > >>> 0x4e000084: 0x4e362598 0x48da8650 0x4e00000b 0x4c07942f > >>> 0x4e000094: 0x4e000aa9 0x0000053e 0x7aa74490 0x3dd2d724 > >>> > >>> x = address LDW tries to load from > >>> z = symbol structure widetag. > >>> According to objdef.lisp, LDW is loading the symbol-name, which > >>> is: (gdb) x/4xs 0x4c079407 > >>> 0x4c079407: "x*READ-ONLY-SPACE-FREE-POINTER*" > >>> > >>> To conclude the LDW is loading from the wrong symbol and also at > >>> the wrong slot, it is rather random from sbcl-version to > >>> sbcl-version. > >>> > >>> If analysis is correct this stepping patch should also be applied > >>> to mips (because we then are loading the symbol-value from the > >>> wrong address there too). > >>> The code currently looks the same between mips and hppa. > >>> > >>> To verify this on mips I would like to check sbcl 1.0.25 under > >>> mips for what address the load instruction uses: compile-file the > >>> above code to see if the LD instruction generated points to the > >>> symbol-value-slot of the *stepping* symbol structure. I'll launch > >>> qemu when I have time to compile 1.0.25 or if anyone want to > >>> check this. > >>> > >>> best regards, > >>> /larry > >> > >> I haven't had the time to check the correctness on mips yet, but > >> 1.0.25.56 still builds. > >> > >> Am I missing something or should load-symbol-value be used here? > > > > Not sure, load-symbol-value is defined in hppa/macros.lisp and is > > used by nlx.lisp. > > > > A check showed that that too also goes wrong: > > > > compilation of nlx.lisp: > > ... > > 0x514a3900: break a,0 > > 0x514a3904: # 71ffec8 > > 0x514a3908: # 1fee801 > > 0x514a390c: stw r13,4(r4) > > 0x514a3910: copy r5,r26 > > 0x514a3914: stw r26,3c(r4) r6=0x4e00000b , $r6+91=0x4e000066 > > VOP save-dynamic-state > > 0x514a3918: ldw 91(r6),r14 r14=0 > > 0x514a391c: break a,0 <--- sigill's, inserted in define-vop > > to hook up gdb > > 0x514a3920: copy sp,r16 > > (end of vop, returning in register r14) > > 0x514a3924: stw r14,38(r4) > > 0x514a3928: stw r15,14(r4) > > 0x514a392c: stw r16,10(r4) > > 0x514a3930: copy r3,r26 > > 0x514a3934: stw r26,40(r4) > > 0x514a3938: addi 20,r4,r25 > > 0x514a393c: ldw a9(r6),r14 > > 0x514a3940: break b,0 > > ... > > > > 0x4e000020: 0x0000053e 0x4e000027 0x79abf308 0x4e00000b > > 0x4e000030: 0x50000017 0x4e00000b 0x0000053e 0x0000004a > > 0x4e000040: 0x00000000 0x4e00000b 0x50000027 0x4e00000b > > 0x4e000050: w 0x0000053e v 0x0000004a h 0x00000000 p 0x4e00000b > > 0x4e000060: n 0x5000003f pa 0x4e00000b 0x0000053e 0x4b0005b0 > > 0x4e000070: 0x00000000 0x4e00000b 0x50000057 0x4e00000b > > 0x4e000080: 0x0000053e 0x4e000530 0x00000000 0x4e00000b > > 0x4e000090: 0x5000007f 0x4e00000b 0x0000053e 0x00000000 > > > > w = widetag, 53e: symbol structure > > v = value > > h = hash > > p = plist > > n = name: x/4xs 0x5000003f => 0x5000003f: "4*CORE-STRING*" > > pa=package > > > > We load return value (reg r14) with 0x4e000066 that points to the > > wrong symbol and slot (should be symbol *current-catch-block* and > > value slot). > > > > It seems ppc, sparc, mips, alpha and hppa all computes > > load-symbol-value the same way.. can they really all be wrong ? > > sound more that I'm wrong then. But again hppa above showes clearly > > that hppa does it wrongly. > > > > best regards, > > /larry > > A new patch that fixes the bugs in nlx.lisp and cleansup call.lisp > regz, /larry > > > diff --git a/src/compiler/hppa/call.lisp > b/src/compiler/hppa/call.lisp index 1edc572..db0771e 100644 > --- a/src/compiler/hppa/call.lisp > +++ b/src/compiler/hppa/call.lisp > @@ -774,13 +774,8 @@ default-value-8 > (insert-step-instrumenting (callable-tn) > ;; Conditionally insert a conditional trap: > (when step-instrumenting > - ;; Get the symbol-value of SB!IMPL::*STEPPING* > - (loadw stepping null-tn > - (+ symbol-value-slot > - (truncate (static-symbol-offset > 'sb!impl::*stepping*) > - n-word-bytes)) > - other-pointer-lowtag) > - ;; If it's not NIL, trap. > + (load-symbol-value stepping sb!impl::*stepping*) > + ;; If symbol-value is not NIL, trap. > ;(inst comb := stepping null-tn step-done-label) > (inst comb := null-tn null-tn step-done-label > > :nullify t) > > ;; CONTEXT-PC will be pointing here when the > diff --git a/src/compiler/hppa/macros.lisp > b/src/compiler/hppa/macros.lisp index 0a5e991..6349953 100644 > --- a/src/compiler/hppa/macros.lisp > +++ b/src/compiler/hppa/macros.lisp > @@ -50,11 +50,11 @@ > (inst ldo offset null-tn ,reg :unsigned t)))))) > > (defmacro load-symbol-value (reg symbol) > - `(inst ldw > - (+ (static-symbol-offset ',symbol) > - (ash symbol-value-slot word-shift) > - (- other-pointer-lowtag)) > - null-tn ,reg)) > + `(loadw ,reg null-tn > + (+ symbol-value-slot > + (truncate (static-symbol-offset ',symbol) > + n-word-bytes)) > + other-pointer-lowtag)) > > (defmacro store-symbol-value (reg symbol) > `(inst stw ,reg (+ (static-symbol-offset ',symbol) As far as I can tell the two new load-symbol-value definition is equivalent. Let's look at the macroexpansion of (load-symbol-value stepping sb!impl:*stepping*): Old: (INST LW STEPPING NULL-TN (+ (STATIC-SYMBOL-OFFSET '*STEPPING*) (ASH SYMBOL-VALUE-SLOT WORD-SHIFT) (- OTHER-POINTER-LOWTAG))) New: (INST LW STEPPING NULL-TN (- (ASH (+ SYMBOL-VALUE-SLOT (TRUNCATE (STATIC-SYMBOL-OFFSET '*STEPPING*) N-WORD-BYTES)) 2) OTHER-POINTER-LOWTAG)) The old definition (not only the macroexpansion) is more similar to code for other architectures and seriously IMHO, cleaner. Otherwise, they are the same. Is there a failure related to NLXs that this patch fixes? |