From: Larry V. <re...@us...> - 2009-02-23 20:49:10
|
Two hppa fixes attached: * stepping bugfix -- symbol-value possibly loaded wrongly on hppa, sbcl-1.0.25.55 sigbus, 1.0.25 doesnt. It sigbusses at point call.lisp:multiple-call-named with (debug 3) enabled that generates symbol-value loading of SB!IMPL::*STEPPING*. If the bugfix is valid, mips should be affected too. * aver-cleanup -- a patch previously sent but not committed (proper checking of instruction constants limits plus cleanups suggested by Nathan Froyd). Details stepping bugfix / hppa-debugging-mini-primer: Under gdb it is shown what is generated for loading an symbol-value compiled to an instruction as: (inst ldw (- (+ symbol-value-slot (truncate (static-symbol-offset 'sb!impl::*stepping*) n-word-bytes)) other-pointer-lowtag) null-tn stepping) => 0x60006b40: ldw 73(r6),r15 r6 = null-tn = 0x4e00000b 0x73 + $r6 = 0x4e00007e now 0x4e00007e is an address that isn't loadable (would hit sigbus) because it isn't word-aligned. The code that triggered this bug was in sb-introspect: >>> (declaim (optimize (debug 3))) (with-compilation-unit nil (eval '(defun four () 4))) <<< Compiling the above code using sbcl-1.0.25.55 would generate a fasl that contained the sigbussing instruction. During compilation of above file, the vop multiple-call-named is used. And because of (debug 3) it will enter the WHEN clause in that vop which contains this LDW instruction that sigbus during compilation. The strange thing is that 1.0.25 works fine (above test + all contrib compiles). If we compile the above code "(compile-file "t" :trace-file t) and looks in the trace file we will find: VOP MULTIPLE-CALL-NAMED t24[NL1] t28[Const7] t27[A0] ... LDW 109, #<TN t49[NULL]>, #<TN t56[A1]> COMB =, #<TN t49[NULL]>, #<TN t49[NULL]>, L17, NULLIFY, T ... Now NULL is the r6 register that contains as before 0x4e00000b. and (+ #x4e00000b 109) => #4E000078 which is an loadable address (word-aligned). My only guess is that the sb!impl::*stepping* symbol has changed place (in the heap) because of other unrelated patches and it miscalculates an address that just happens to be word-aligned but has never pointed to the symbol-value of the wanted symbol. So is sbcl-1.0.25 loading the symbol-value from the wrong address, lets find out. Gdb is entered and sbcl-1.0.25 loads up, then we hit break to enter gdb again. The trace file said offset 109 plus register r6. Lets look at the memory around that region (address 0x4e000078 == $r6 + 109): (gdb) x/4xw 0x4e00000b + 109 0x4e000078: 0x4c079407 0x4e000aa9 0x0000053e 0x4e362598 (gdb) x/40xw 0x4e00000b + 109 - 20 0x4e000064: 0x4e0007d1 z 0x0000053e 0x4c4a96d8 0x0dd395b0 0x4e000074: 0x4e00000b x 0x4c079407 0x4e000aa9 0x0000053e 0x4e000084: 0x4e362598 0x48da8650 0x4e00000b 0x4c07942f 0x4e000094: 0x4e000aa9 0x0000053e 0x7aa74490 0x3dd2d724 x = address LDW tries to load from z = symbol structure widetag. According to objdef.lisp, LDW is loading the symbol-name, which is: (gdb) x/4xs 0x4c079407 0x4c079407: "x*READ-ONLY-SPACE-FREE-POINTER*" To conclude the LDW is loading from the wrong symbol and also at the wrong slot, it is rather random from sbcl-version to sbcl-version. If analysis is correct this stepping patch should also be applied to mips (because we then are loading the symbol-value from the wrong address there too). The code currently looks the same between mips and hppa. To verify this on mips I would like to check sbcl 1.0.25 under mips for what address the load instruction uses: compile-file the above code to see if the LD instruction generated points to the symbol-value-slot of the *stepping* symbol structure. I'll launch qemu when I have time to compile 1.0.25 or if anyone want to check this. best regards, /larry |