From: Sergiy K. <se...@cs...> - 2004-04-01 06:18:44
|
Hi, This code in libvm.C fails to correctly process null pointer dereference on PPC64 Linux: int isNullPtrExn = (signum == SIGSEGV) && (isVmSignal(ip, jtoc)) #ifdef RVM_FOR_32_ADDR && ((faultingAddress & 0xffff0000) == 0xffff0000) #elif defined RVM_FOR_64_ADDR && ((faultingAddress & 0xffffffffffff0000) == 0xffffffffffff0000) #endif ; int isTrap = signum == SIGTRAP; int isRecoverable = isNullPtrExn | isTrap; The faulting address on PPC64 Linux is 0x0000000000000380 (vs 0xfffffffc on PPC32 Linux) which after masking results in setting isRecoverable to 0 and RVM crash with 'UNKNOWN ERROR'. So any NullPointerException in Java code causes RVM to crash. Obviously this masking is wrong for PPC64 Linux, but what should be there instead? Any ideas? Sergiy |
From: Eliot M. <mo...@cs...> - 2004-04-01 06:36:57
|
What I'm ondering, Sergiy, is how a null pointer dereference results in any address other than a negative one (i.e., one in very high memory). The only accesses that should cause such a trap are to object headers or scalar fields (arrays will access the length field in the header first, at the least), all of which are at negative offset from the pointer -- unless you've done something odd to the object model. Perhaps the trap handler is not getting the actual faulting address or something? It would not surprise me if we don't have the fault information right for PPC 64 Linux. The offsets would tend to be different from those for PPC 32 Linux, etc. -- Eliot |
From: David P G. <gr...@us...> - 2004-04-01 06:55:48
|
There's some code that Perry wrote a long time ago in the PowerPC version of libvm.C that intentionally causes a hardware trap as part of booting at a known (bad) faulting address. This then allows us to be sure that we know how to interpret the signal handling register save structures. It appears that this code is only enabled on AIX. If I were working on PPC/Linux (32 or 64) I would enable this code and make sure that it works. Much easier to debug a known segfault invoked from the C code of the bootimage runner than trying to deal with a null pointer exception from the Java code.\ Look for getFaultingAddress in libvm.c --dave |
From: Sergiy K. <se...@cs...> - 2004-04-01 07:17:34
|
OK. I remember dealing with this in 2.0.3 64-bit AIX port. _____ From: jik...@os... [mailto:jik...@os...] On Behalf Of David P Grove Sent: Wednesday, March 31, 2004 8:56 PM To: jik...@os... Subject: Re: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux There's some code that Perry wrote a long time ago in the PowerPC version of libvm.C that intentionally causes a hardware trap as part of booting at a known (bad) faulting address. This then allows us to be sure that we know how to interpret the signal handling register save structures. It appears that this code is only enabled on AIX. If I were working on PPC/Linux (32 or 64) I would enable this code and make sure that it works. Much easier to debug a known segfault invoked from the C code of the bootimage runner than trying to deal with a null pointer exception from the Java code.\ Look for getFaultingAddress in libvm.c --dave |
From: Sergiy K. <se...@cs...> - 2004-04-01 07:56:58
|
I just enabled the test and added Linux specific case. It works fine and passes the test during booting. Any other suggestions? Sergiy _____ From: jik...@os... [mailto:jik...@os...] On Behalf Of David P Grove Sent: Wednesday, March 31, 2004 8:56 PM To: jik...@os... Subject: Re: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux There's some code that Perry wrote a long time ago in the PowerPC version of libvm.C that intentionally causes a hardware trap as part of booting at a known (bad) faulting address. This then allows us to be sure that we know how to interpret the signal handling register save structures. It appears that this code is only enabled on AIX. If I were working on PPC/Linux (32 or 64) I would enable this code and make sure that it works. Much easier to debug a known segfault invoked from the C code of the bootimage runner than trying to deal with a null pointer exception from the Java code.\ Look for getFaultingAddress in libvm.c --dave |
From: Chris H. <hof...@cs...> - 2004-04-01 17:34:57
|
Sergiy, Well, like Eliot, I wonder just what was happening in the Java program that triggered the null pointer exception. Did it actually have a null pointer? As a simple example, if the null pointer was supposed to come from an object field but your field access routine was wrong, then you could have any old garbage. I'd try having Jikes generate the machine code for the routine causing the null ptr exception and looking over the instructions just before it occurs to see if they make sense. Does it happen with both the Base and Opt compilers? If it's only one, that would suggest that its a compiler issue, not a trap handler issue. If it happens with the Base compiler, I have some rather ugly code that could be used to print out the register values after each bytecode's instructions are executed. Chris Sergiy Kyrylkov wrote: > I just enabled the test and added Linux specific case. It works fine > and passes the test during booting. > > Any other suggestions? > -- Chris Hoffmann -- Dept. of Computer Science/UMass at Amherst http://www-ali.cs.umass.edu/~hoffmann |
From: Sergiy K. <se...@cs...> - 2004-04-01 19:11:20
|
I just wrote a simple program with the following two lines, which throw NullPointerException: Object m = null; m.getClass(); I have a crash again with reportedly positive faulting address later in C code. I also run it in gdb and everything looks right in terms of machine code: 0x0000000060226a68: li r3,0 0x0000000060226a6c: stdu r3,-8(r12) 0x0000000060226a70: ld r3,0(r12) 0x0000000060226a74: addi r12,r12,8 0x0000000060226a78: std r3,32(r1) 0x0000000060226a7c: ld r3,32(r1) 0x0000000060226a80: stdu r3,-8(r12) 0x0000000060226a84: ld r3,0(r12) 0x0000000060226a88: ld r4,-24(r3) 0x0000000060226a8c: ld r5,64(r4) 0x0000000060226a90: mtlr r5 0x0000000060226a94: ld r3,0(r12) 0x0000000060226a98: std r12,48(r1) 0x0000000060226a9c: blrl The null pointer dereference happens at 0x0000000060226a88 with r3 being 0. Obviously the faulting address should be 0xffffffffffffffe8 Looks like the problem may be with pt_regs *save = getLinuxSavedRegisters(signum, arg3); And getcontext() implementation on PPC64 Linux. Sergiy > -----Original Message----- > From: jik...@os... > [mailto:jik...@os...] On Behalf > Of Chris Hoffmann > Sent: Thursday, April 01, 2004 7:35 AM > To: jik...@os... > Cc: se...@cs... > Subject: Re: [Jikesrvm-core] cTrapHandler does not work for > null pointer dereference on PPC64 Linux > > Sergiy, > > Well, like Eliot, I wonder just what was happening in the > Java program that triggered the null pointer exception. Did > it actually have a null pointer? As a simple example, if the > null pointer was supposed to come from an object field but > your field access routine was wrong, then you could have any > old garbage. > > I'd try having Jikes generate the machine code for the > routine causing the null ptr exception and looking over the > instructions just before it occurs to see if they make sense. > > Does it happen with both the Base and Opt compilers? If it's > only one, that would suggest that its a compiler issue, not a > trap handler issue. > > If it happens with the Base compiler, I have some rather ugly > code that could be used to print out the register values > after each bytecode's instructions are executed. > > Chris > > Sergiy Kyrylkov wrote: > > I just enabled the test and added Linux specific case. It > works fine > > and passes the test during booting. > > > > Any other suggestions? > > > > -- > Chris Hoffmann -- Dept. of Computer Science/UMass at Amherst > http://www-ali.cs.umass.edu/~hoffmann > _______________________________________________ > Jikesrvm-core mailing list > Jik...@os... > http://www-124.ibm.com/developerworks/oss/mailman/listinfo/jik > esrvm-core |
From: David P G. <gr...@us...> - 2004-04-01 19:25:01
|
What version of Jikes RVM are you using? I'm assuming it is not the cvs head ppc64 because we stop using the explicit stack pointer and store with update sequence in the PPC baseline compiler almost a year ago. This looks to me like an old code sequence for pushing null on the stack. --dave 0x0000000060226a68: li r3,0 0x0000000060226a6c: stdu r3,-8(r12) |
From: Sergiy K. <se...@cs...> - 2004-04-01 19:44:05
|
This problem currently exists on both 2.0.3 and CVS head. The MC sequence was from 2.0.3, but CVS head has the same behavior and exactly the same strange faulting address 0x0000000000000380 for NullPointerException. Sergiy _____ From: jik...@os... [mailto:jik...@os...] On Behalf Of David P Grove Sent: Thursday, April 01, 2004 9:24 AM To: jik...@os... Subject: RE: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux What version of Jikes RVM are you using? I'm assuming it is not the cvs head ppc64 because we stop using the explicit stack pointer and store with update sequence in the PPC baseline compiler almost a year ago. This looks to me like an old code sequence for pushing null on the stack. --dave 0x0000000060226a68: li r3,0 0x0000000060226a6c: stdu r3,-8(r12) |
From: Eliot M. <mo...@cs...> - 2004-04-01 20:57:12
|
Sergiy -- Again, I suspect the code that examines the saved register context and fault information is not right. The code appears to cast arg3 to a ucontext*, but that could be wrong, and/or ucontext* might need a different definition. Where is there a PPC64/Linux machine I can log in to and poke around? -- Eliot |
From: Sergiy K. <se...@cs...> - 2004-04-01 21:02:19
|
nu.cs.unm.edu with 64-bit toolchain at /opt/ppc64 and a fresh snapshot of CVS head with enabled faulting address test for Linux at /tmp/head > -----Original Message----- > From: jik...@os... > [mailto:jik...@os...] On Behalf > Of Eliot Moss > Sent: Thursday, April 01, 2004 10:57 AM > To: jik...@os... > Subject: RE: [Jikesrvm-core] cTrapHandler does not work for > null pointer dereference on PPC64 Linux > > Sergiy -- > > Again, I suspect the code that examines the saved register > context and fault information is not right. > > The code appears to cast arg3 to a ucontext*, but that could > be wrong, and/or ucontext* might need a different definition. > > Where is there a PPC64/Linux machine I can log in to and poke around? > > -- Eliot > _______________________________________________ > Jikesrvm-core mailing list > Jik...@os... > http://www-124.ibm.com/developerworks/oss/mailman/listinfo/jik > esrvm-core |
From: Sergiy K. <se...@cs...> - 2004-04-01 21:23:38
|
Eliot, I found the correct value: ...................... Program received signal SIGSEGV, Segmentation fault. 0x0000000060226a88 in ?? () (gdb) c Continuing. Breakpoint 1, getLinuxSavedContext(int, void*) (signum=11, arg3=0x1ffffffeeb0) at libvm.C:209 209 return &((ucontext_t*)arg3)->uc_mcontext; (gdb) x/128xg arg3 0x1ffffffeeb0: 0x0000000000000000 0x0000000000000000 0x1ffffffeec0: 0x000001ffffffd628 0x0000000000000000 ........................ ........................ ........................ 0x1fffffff0e0: 0xffffffffffffffe8 0x0000000000200000 Now the question is how do we extract it correctly? Sergiy > -----Original Message----- > From: jik...@os... > [mailto:jik...@os...] On Behalf > Of Eliot Moss > Sent: Thursday, April 01, 2004 10:57 AM > To: jik...@os... > Subject: RE: [Jikesrvm-core] cTrapHandler does not work for > null pointer dereference on PPC64 Linux > > Sergiy -- > > Again, I suspect the code that examines the saved register > context and fault information is not right. > > The code appears to cast arg3 to a ucontext*, but that could > be wrong, and/or ucontext* might need a different definition. > > Where is there a PPC64/Linux machine I can log in to and poke around? > > -- Eliot > _______________________________________________ > Jikesrvm-core mailing list > Jik...@os... > http://www-124.ibm.com/developerworks/oss/mailman/listinfo/jik > esrvm-core |
From: Eliot M. <mo...@cs...> - 2004-04-01 21:58:01
|
>>>>> "Sergiy" == Sergiy Kyrylkov <se...@cs...> writes: Sergiy> Eliot, Sergiy> I found the correct value: Sergiy> ...................... Sergiy> Program received signal SIGSEGV, Segmentation fault. Sergiy> 0x0000000060226a88 in ?? () Sergiy> (gdb) c Sergiy> Continuing. Sergiy> Breakpoint 1, getLinuxSavedContext(int, void*) (signum=11, Sergiy> arg3=0x1ffffffeeb0) Sergiy> at libvm.C:209 Sergiy> 209 return &((ucontext_t*)arg3)->uc_mcontext; Sergiy> (gdb) x/128xg arg3 Sergiy> 0x1ffffffeeb0: 0x0000000000000000 0x0000000000000000 Sergiy> 0x1ffffffeec0: 0x000001ffffffd628 0x0000000000000000 Sergiy> ........................ Sergiy> ........................ Sergiy> ........................ Sergiy> 0x1fffffff0e0: 0xffffffffffffffe8 0x0000000000200000 Sergiy> Now the question is how do we extract it correctly? I took a quick look, and it appears that the ucontext, sigcontext, and ptrace .h files (the latter ultimately in /usr/include/asm-ppc64, but "redirected" multiple levels) hold the key. I think that casting arg3 properly and following links / extracting fields in it properly will work. And perhaps the get_context functions actually work and all now. Can you dig into all that a little bit more? -- Eliot |
From: Sergiy K. <se...@cs...> - 2004-04-01 22:03:34
|
Yes, I will try to figure it out, but getcontext does not seem to work. Sergiy > -----Original Message----- > From: jik...@os... > [mailto:jik...@os...] On Behalf > Of Eliot Moss > Sent: Thursday, April 01, 2004 11:58 AM > To: jik...@os... > Subject: RE: [Jikesrvm-core] cTrapHandler does not work for > null pointer dereference on PPC64 Linux > > >>>>> "Sergiy" == Sergiy Kyrylkov <se...@cs...> writes: > > Sergiy> Eliot, > Sergiy> I found the correct value: > > Sergiy> ...................... > Sergiy> Program received signal SIGSEGV, Segmentation fault. > Sergiy> 0x0000000060226a88 in ?? () > Sergiy> (gdb) c > Sergiy> Continuing. > > Sergiy> Breakpoint 1, getLinuxSavedContext(int, void*) (signum=11, > Sergiy> arg3=0x1ffffffeeb0) > Sergiy> at libvm.C:209 > Sergiy> 209 return &((ucontext_t*)arg3)->uc_mcontext; > Sergiy> (gdb) x/128xg arg3 > Sergiy> 0x1ffffffeeb0: 0x0000000000000000 0x0000000000000000 > Sergiy> 0x1ffffffeec0: 0x000001ffffffd628 0x0000000000000000 > Sergiy> ........................ > Sergiy> ........................ > Sergiy> ........................ > Sergiy> 0x1fffffff0e0: 0xffffffffffffffe8 0x0000000000200000 > > Sergiy> Now the question is how do we extract it correctly? > > I took a quick look, and it appears that the ucontext, > sigcontext, and ptrace .h files (the latter ultimately in > /usr/include/asm-ppc64, but "redirected" multiple levels) > hold the key. I think that casting arg3 properly and > following links / extracting fields in it properly will work. > And perhaps the get_context functions actually work and all > now. Can you dig into all that a little bit more? > > -- Eliot > _______________________________________________ > Jikesrvm-core mailing list > Jik...@os... > http://www-124.ibm.com/developerworks/oss/mailman/listinfo/jik > esrvm-core |
From: Eliot M. <mo...@cs...> - 2004-04-01 17:40:28
|
For now, PPC64 is Base compiler only, so that answers that question :-). But yes, it could be helpful to look at the code near the fault. -- E |
From: Sergiy K. <se...@cs...> - 2004-04-01 20:56:56
|
The fact that Perry's code for faulting address works also suggests that this problem appears only with SIGSEGV associated with negative addresses which is quite interesting. Sergiy _____ From: jik...@os... [mailto:jik...@os...] On Behalf Of Sergiy Kyrylkov Sent: Wednesday, March 31, 2004 9:57 PM To: jik...@os... Subject: RE: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux I just enabled the test and added Linux specific case. It works fine and passes the test during booting. Any other suggestions? Sergiy _____ From: jik...@os... [mailto:jik...@os...] On Behalf Of David P Grove Sent: Wednesday, March 31, 2004 8:56 PM To: jik...@os... Subject: Re: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux There's some code that Perry wrote a long time ago in the PowerPC version of libvm.C that intentionally causes a hardware trap as part of booting at a known (bad) faulting address. This then allows us to be sure that we know how to interpret the signal handling register save structures. It appears that this code is only enabled on AIX. If I were working on PPC/Linux (32 or 64) I would enable this code and make sure that it works. Much easier to debug a known segfault invoked from the C code of the bootimage runner than trying to deal with a null pointer exception from the Java code.\ Look for getFaultingAddress in libvm.c --dave |
From: Eliot M. <mo...@cs...> - 2004-04-01 21:01:01
|
>>>>> "Sergiy" == Sergiy Kyrylkov <se...@cs...> writes: Sergiy> The fact that Perry's code for faulting address works also Sergiy> suggests that this problem appears only with SIGSEGV associated Sergiy> with negative addresses which is quite interesting. Well, my recollection is that the faulting address occurs in more than one place in the data strcutres. Perhaps it is reliably filled in in one place, and not always done in the other, and we're reading the "wrong" one? Anyway, sounds like time to sit down with gdb, poke around in the structure, and see if the real address is there, only somewhere else! -- Eliot |
From: Sergiy K. <se...@cs...> - 2004-04-01 07:15:35
|
At a very high level siginfo.h is the same for both ppc and ppc64. I will have to look deeper in Linux kernel sources. Sergiy > -----Original Message----- > From: jik...@os... > [mailto:jik...@os...] On Behalf > Of Eliot Moss > Sent: Wednesday, March 31, 2004 8:37 PM > To: jik...@os... > Subject: [Jikesrvm-core] cTrapHandler does not work for null > pointer dereference on PPC64 Linux > > What I'm ondering, Sergiy, is how a null pointer dereference > results in any address other than a negative one (i.e., one > in very high memory). The only accesses that should cause > such a trap are to object headers or scalar fields (arrays > will access the length field in the header first, at the > least), all of which are at negative offset from the pointer > -- unless you've done something odd to the object model. > Perhaps the trap handler is not getting the actual faulting > address or something? It would not surprise me if we don't > have the fault information right for PPC 64 Linux. The > offsets would tend to be different from those for PPC 32 Linux, etc. > > -- Eliot > _______________________________________________ > Jikesrvm-core mailing list > Jik...@os... > http://www-124.ibm.com/developerworks/oss/mailman/listinfo/jik > esrvm-core |