Thread: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux | Jikes RVM

jikesrvm-core

[Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux

From: Sergiy K. <se...@cs...> - 2004-04-01 06:18:44

Hi,

This code in libvm.C fails to correctly process null pointer dereference on
PPC64 Linux:

int isNullPtrExn = (signum == SIGSEGV) && (isVmSignal(ip, jtoc)) 
#ifdef RVM_FOR_32_ADDR
        && ((faultingAddress & 0xffff0000) == 0xffff0000)
#elif defined RVM_FOR_64_ADDR
        && ((faultingAddress & 0xffffffffffff0000) == 0xffffffffffff0000)
#endif
        ;
    int isTrap = signum == SIGTRAP;
    int isRecoverable = isNullPtrExn | isTrap;

The faulting address on PPC64 Linux is 0x0000000000000380 (vs 0xfffffffc on
PPC32 Linux) which after masking results in setting isRecoverable to 0 and
RVM crash with 'UNKNOWN ERROR'.  So any NullPointerException in Java code
causes RVM to crash.  Obviously this masking is wrong for PPC64 Linux, but
what should be there instead?

Any ideas?

Sergiy

[Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux

From: Eliot M. <mo...@cs...> - 2004-04-01 06:36:57

What I'm ondering, Sergiy, is how a null pointer dereference results in any
address other than a negative one (i.e., one in very high memory). The only
accesses that should cause such a trap are to object headers or scalar
fields (arrays will access the length field in the header first, at the
least), all of which are at negative offset from the pointer -- unless
you've done something odd to the object model. Perhaps the trap handler is
not getting the actual faulting address or something? It would not surprise
me if we don't have the fault information right for PPC 64 Linux. The
offsets would tend to be different from those for PPC 32 Linux, etc.

-- Eliot

Re: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux

From: David P G. <gr...@us...> - 2004-04-01 06:55:48

There's some code that Perry wrote a long time ago in the PowerPC version 
of libvm.C that intentionally causes a hardware trap as part of booting at 
a known (bad) faulting address.  This then allows us to be sure that we 
know how to interpret the signal handling register save structures.

It appears that this code is only enabled on AIX.  If I were working on 
PPC/Linux (32 or 64) I would enable this code and make sure that it works. 
 Much easier to debug a known segfault invoked from the C code of the 
bootimage runner than trying to deal with a null pointer exception from 
the Java code.\

Look for getFaultingAddress in libvm.c

--dave

RE: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux

From: Sergiy K. <se...@cs...> - 2004-04-01 07:17:34

OK.  I remember dealing with this in 2.0.3 64-bit AIX port.

  _____  

From: jik...@os...
[mailto:jik...@os...] On Behalf Of David P Grove
Sent: Wednesday, March 31, 2004 8:56 PM
To: jik...@os...
Subject: Re: [Jikesrvm-core] cTrapHandler does not work for null pointer
dereference on PPC64 Linux

There's some code that Perry wrote a long time ago in the PowerPC version of
libvm.C that intentionally causes a hardware trap as part of booting at a
known (bad) faulting address.  This then allows us to be sure that we know
how to interpret the signal handling register save structures. 

It appears that this code is only enabled on AIX.  If I were working on
PPC/Linux (32 or 64) I would enable this code and make sure that it works.
Much easier to debug a known segfault invoked from the C code of the
bootimage runner than trying to deal with a null pointer exception from the
Java code.\ 

Look for getFaultingAddress in libvm.c 

--dave

RE: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux

From: Sergiy K. <se...@cs...> - 2004-04-01 07:56:58

I just enabled the test and added Linux specific case.  It works fine and
passes the test during booting.
 
Any other suggestions?
 
Sergiy


  _____  

From: jik...@os...
[mailto:jik...@os...] On Behalf Of David P Grove
Sent: Wednesday, March 31, 2004 8:56 PM
To: jik...@os...
Subject: Re: [Jikesrvm-core] cTrapHandler does not work for null pointer
dereference on PPC64 Linux



There's some code that Perry wrote a long time ago in the PowerPC version of
libvm.C that intentionally causes a hardware trap as part of booting at a
known (bad) faulting address.  This then allows us to be sure that we know
how to interpret the signal handling register save structures. 

It appears that this code is only enabled on AIX.  If I were working on
PPC/Linux (32 or 64) I would enable this code and make sure that it works.
Much easier to debug a known segfault invoked from the C code of the
bootimage runner than trying to deal with a null pointer exception from the
Java code.\ 

Look for getFaultingAddress in libvm.c 

--dave

Re: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux

From: Chris H. <hof...@cs...> - 2004-04-01 17:34:57

Sergiy,

Well, like Eliot, I wonder just what was happening in the Java program 
that triggered the null pointer exception. Did it actually have a null 
pointer? As a simple example, if the null pointer was supposed to come 
from an object field but your field access routine was wrong, then you 
could have any old garbage.

I'd try having Jikes generate the machine code for the routine causing 
the null ptr exception and looking over the instructions just before it 
occurs to see if they make sense.

Does it happen with both the Base and Opt compilers? If it's only one, 
that would suggest that its a compiler issue, not a trap handler issue.

If it happens with the Base compiler, I have some rather ugly code that 
could be used to print out the register values after each bytecode's 
instructions are executed.

Chris

Sergiy Kyrylkov wrote:
> I just enabled the test and added Linux specific case.  It works fine 
> and passes the test during booting.
>  
> Any other suggestions?
>  

-- 
Chris Hoffmann -- Dept. of Computer Science/UMass at Amherst
http://www-ali.cs.umass.edu/~hoffmann

RE: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux

From: Sergiy K. <se...@cs...> - 2004-04-01 19:11:20

I just wrote a simple program with the following two lines, which throw
NullPointerException:

Object m = null;
m.getClass();

I have a crash again with reportedly positive faulting address later in C
code.  

I also run it in gdb and everything looks right in terms of machine code:  

0x0000000060226a68:     li      r3,0
0x0000000060226a6c:     stdu    r3,-8(r12)
0x0000000060226a70:     ld      r3,0(r12)
0x0000000060226a74:     addi    r12,r12,8
0x0000000060226a78:     std     r3,32(r1)
0x0000000060226a7c:     ld      r3,32(r1)
0x0000000060226a80:     stdu    r3,-8(r12)
0x0000000060226a84:     ld      r3,0(r12)
0x0000000060226a88:     ld      r4,-24(r3)
0x0000000060226a8c:     ld      r5,64(r4)
0x0000000060226a90:     mtlr    r5
0x0000000060226a94:     ld      r3,0(r12)
0x0000000060226a98:     std     r12,48(r1)
0x0000000060226a9c:     blrl

The null pointer dereference happens at 0x0000000060226a88 with r3 being 0.
Obviously the faulting address should be 0xffffffffffffffe8

Looks like the problem may be with 

pt_regs *save = getLinuxSavedRegisters(signum, arg3);

And getcontext() implementation on PPC64 Linux.

Sergiy

> -----Original Message-----
> From: jik...@os... 
> [mailto:jik...@os...] On Behalf 
> Of Chris Hoffmann
> Sent: Thursday, April 01, 2004 7:35 AM
> To: jik...@os...
> Cc: se...@cs...
> Subject: Re: [Jikesrvm-core] cTrapHandler does not work for 
> null pointer dereference on PPC64 Linux
> 
> Sergiy,
> 
> Well, like Eliot, I wonder just what was happening in the 
> Java program that triggered the null pointer exception. Did 
> it actually have a null pointer? As a simple example, if the 
> null pointer was supposed to come from an object field but 
> your field access routine was wrong, then you could have any 
> old garbage.
> 
> I'd try having Jikes generate the machine code for the 
> routine causing the null ptr exception and looking over the 
> instructions just before it occurs to see if they make sense.
> 
> Does it happen with both the Base and Opt compilers? If it's 
> only one, that would suggest that its a compiler issue, not a 
> trap handler issue.
> 
> If it happens with the Base compiler, I have some rather ugly 
> code that could be used to print out the register values 
> after each bytecode's instructions are executed.
> 
> Chris
> 
> Sergiy Kyrylkov wrote:
> > I just enabled the test and added Linux specific case.  It 
> works fine 
> > and passes the test during booting.
> >  
> > Any other suggestions?
> >  
> 
> --
> Chris Hoffmann -- Dept. of Computer Science/UMass at Amherst 
> http://www-ali.cs.umass.edu/~hoffmann
> _______________________________________________
> Jikesrvm-core mailing list
> Jik...@os...
> http://www-124.ibm.com/developerworks/oss/mailman/listinfo/jik
> esrvm-core

RE: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux

From: David P G. <gr...@us...> - 2004-04-01 19:25:01

What version of Jikes RVM are you using?  I'm assuming it is not the cvs 
head ppc64 because we stop using the explicit stack pointer and store with 
update sequence in the PPC baseline compiler almost a year ago.  This 
looks to me like an old code sequence for pushing null on the stack.

--dave

0x0000000060226a68:     li      r3,0
0x0000000060226a6c:     stdu    r3,-8(r12)

RE: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux

From: Sergiy K. <se...@cs...> - 2004-04-01 19:44:05

This problem currently exists on both 2.0.3 and CVS head.  The MC sequence
was from 2.0.3, but CVS head has the same behavior and exactly the same
strange faulting address 0x0000000000000380 for NullPointerException.
 
Sergiy 


  _____  

From: jik...@os...
[mailto:jik...@os...] On Behalf Of David P Grove
Sent: Thursday, April 01, 2004 9:24 AM
To: jik...@os...
Subject: RE: [Jikesrvm-core] cTrapHandler does not work for null pointer
dereference on PPC64 Linux



What version of Jikes RVM are you using?  I'm assuming it is not the cvs
head ppc64 because we stop using the explicit stack pointer and store with
update sequence in the PPC baseline compiler almost a year ago.  This looks
to me like an old code sequence for pushing null on the stack. 

--dave 

0x0000000060226a68:     li      r3,0
0x0000000060226a6c:     stdu    r3,-8(r12)

RE: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux

From: Eliot M. <mo...@cs...> - 2004-04-01 20:57:12

Sergiy --

Again, I suspect the code that examines the saved register context and
fault information is not right.

The code appears to cast arg3 to a ucontext*, but that could be wrong,
and/or ucontext* might need a different definition.

Where is there a PPC64/Linux machine I can log in to and poke around?

-- Eliot

RE: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux

From: Sergiy K. <se...@cs...> - 2004-04-01 21:02:19

nu.cs.unm.edu with 64-bit toolchain at /opt/ppc64 and a fresh snapshot of
CVS head with enabled faulting address test for Linux at /tmp/head

> -----Original Message-----
> From: jik...@os... 
> [mailto:jik...@os...] On Behalf 
> Of Eliot Moss
> Sent: Thursday, April 01, 2004 10:57 AM
> To: jik...@os...
> Subject: RE: [Jikesrvm-core] cTrapHandler does not work for 
> null pointer dereference on PPC64 Linux
> 
> Sergiy --
> 
> Again, I suspect the code that examines the saved register 
> context and fault information is not right.
> 
> The code appears to cast arg3 to a ucontext*, but that could 
> be wrong, and/or ucontext* might need a different definition.
> 
> Where is there a PPC64/Linux machine I can log in to and poke around?
> 
> -- Eliot
> _______________________________________________
> Jikesrvm-core mailing list
> Jik...@os...
> http://www-124.ibm.com/developerworks/oss/mailman/listinfo/jik
> esrvm-core

RE: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux

From: Sergiy K. <se...@cs...> - 2004-04-01 21:23:38

Eliot,

I found the correct value:

......................
Program received signal SIGSEGV, Segmentation fault.
0x0000000060226a88 in ?? ()
(gdb) c
Continuing.

Breakpoint 1, getLinuxSavedContext(int, void*) (signum=11,
arg3=0x1ffffffeeb0)
    at libvm.C:209
209        return &((ucontext_t*)arg3)->uc_mcontext;
(gdb) x/128xg arg3
0x1ffffffeeb0:  0x0000000000000000      0x0000000000000000
0x1ffffffeec0:  0x000001ffffffd628      0x0000000000000000
........................
........................
........................
0x1fffffff0e0:  0xffffffffffffffe8      0x0000000000200000

Now the question is how do we extract it correctly?

Sergiy

> -----Original Message-----
> From: jik...@os... 
> [mailto:jik...@os...] On Behalf 
> Of Eliot Moss
> Sent: Thursday, April 01, 2004 10:57 AM
> To: jik...@os...
> Subject: RE: [Jikesrvm-core] cTrapHandler does not work for 
> null pointer dereference on PPC64 Linux
> 
> Sergiy --
> 
> Again, I suspect the code that examines the saved register 
> context and fault information is not right.
> 
> The code appears to cast arg3 to a ucontext*, but that could 
> be wrong, and/or ucontext* might need a different definition.
> 
> Where is there a PPC64/Linux machine I can log in to and poke around?
> 
> -- Eliot
> _______________________________________________
> Jikesrvm-core mailing list
> Jik...@os...
> http://www-124.ibm.com/developerworks/oss/mailman/listinfo/jik
> esrvm-core

RE: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux

From: Eliot M. <mo...@cs...> - 2004-04-01 21:58:01

>>>>> "Sergiy" == Sergiy Kyrylkov <se...@cs...> writes:

    Sergiy> Eliot,
    Sergiy> I found the correct value:

    Sergiy> ......................
    Sergiy> Program received signal SIGSEGV, Segmentation fault.
    Sergiy> 0x0000000060226a88 in ?? ()
    Sergiy> (gdb) c
    Sergiy> Continuing.

    Sergiy> Breakpoint 1, getLinuxSavedContext(int, void*) (signum=11,
    Sergiy> arg3=0x1ffffffeeb0)
    Sergiy>     at libvm.C:209
    Sergiy> 209        return &((ucontext_t*)arg3)->uc_mcontext;
    Sergiy> (gdb) x/128xg arg3
    Sergiy> 0x1ffffffeeb0:  0x0000000000000000      0x0000000000000000
    Sergiy> 0x1ffffffeec0:  0x000001ffffffd628      0x0000000000000000
    Sergiy> ........................
    Sergiy> ........................
    Sergiy> ........................
    Sergiy> 0x1fffffff0e0:  0xffffffffffffffe8      0x0000000000200000

    Sergiy> Now the question is how do we extract it correctly?

I took a quick look, and it appears that the ucontext, sigcontext, and
ptrace .h files (the latter ultimately in /usr/include/asm-ppc64, but
"redirected" multiple levels) hold the key. I think that casting arg3
properly and following links / extracting fields in it properly will
work. And perhaps the get_context functions actually work and all now. Can
you dig into all that a little bit more?

-- Eliot

RE: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux

From: Sergiy K. <se...@cs...> - 2004-04-01 22:03:34

Yes, I will try to figure it out, but getcontext does not seem to work.

Sergiy 

> -----Original Message-----
> From: jik...@os... 
> [mailto:jik...@os...] On Behalf 
> Of Eliot Moss
> Sent: Thursday, April 01, 2004 11:58 AM
> To: jik...@os...
> Subject: RE: [Jikesrvm-core] cTrapHandler does not work for 
> null pointer dereference on PPC64 Linux
> 
> >>>>> "Sergiy" == Sergiy Kyrylkov <se...@cs...> writes:
> 
>     Sergiy> Eliot,
>     Sergiy> I found the correct value:
> 
>     Sergiy> ......................
>     Sergiy> Program received signal SIGSEGV, Segmentation fault.
>     Sergiy> 0x0000000060226a88 in ?? ()
>     Sergiy> (gdb) c
>     Sergiy> Continuing.
> 
>     Sergiy> Breakpoint 1, getLinuxSavedContext(int, void*) (signum=11,
>     Sergiy> arg3=0x1ffffffeeb0)
>     Sergiy>     at libvm.C:209
>     Sergiy> 209        return &((ucontext_t*)arg3)->uc_mcontext;
>     Sergiy> (gdb) x/128xg arg3
>     Sergiy> 0x1ffffffeeb0:  0x0000000000000000      0x0000000000000000
>     Sergiy> 0x1ffffffeec0:  0x000001ffffffd628      0x0000000000000000
>     Sergiy> ........................
>     Sergiy> ........................
>     Sergiy> ........................
>     Sergiy> 0x1fffffff0e0:  0xffffffffffffffe8      0x0000000000200000
> 
>     Sergiy> Now the question is how do we extract it correctly?
> 
> I took a quick look, and it appears that the ucontext, 
> sigcontext, and ptrace .h files (the latter ultimately in 
> /usr/include/asm-ppc64, but "redirected" multiple levels) 
> hold the key. I think that casting arg3 properly and 
> following links / extracting fields in it properly will work. 
> And perhaps the get_context functions actually work and all 
> now. Can you dig into all that a little bit more?
> 
> -- Eliot
> _______________________________________________
> Jikesrvm-core mailing list
> Jik...@os...
> http://www-124.ibm.com/developerworks/oss/mailman/listinfo/jik
> esrvm-core

Re: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux

From: Eliot M. <mo...@cs...> - 2004-04-01 17:40:28

For now, PPC64 is Base compiler only, so that answers that question
:-). But yes, it could be helpful to look at the code near the fault.

-- E

RE: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux

From: Sergiy K. <se...@cs...> - 2004-04-01 20:56:56

The fact that Perry's code for faulting address works also suggests that
this problem appears only with SIGSEGV associated with negative addresses
which is quite interesting.
 
Sergiy


  _____  

From: jik...@os...
[mailto:jik...@os...] On Behalf Of Sergiy
Kyrylkov
Sent: Wednesday, March 31, 2004 9:57 PM
To: jik...@os...
Subject: RE: [Jikesrvm-core] cTrapHandler does not work for null pointer
dereference on PPC64 Linux


I just enabled the test and added Linux specific case.  It works fine and
passes the test during booting.
 
Any other suggestions?
 
Sergiy


  _____  

From: jik...@os...
[mailto:jik...@os...] On Behalf Of David P Grove
Sent: Wednesday, March 31, 2004 8:56 PM
To: jik...@os...
Subject: Re: [Jikesrvm-core] cTrapHandler does not work for null pointer
dereference on PPC64 Linux



There's some code that Perry wrote a long time ago in the PowerPC version of
libvm.C that intentionally causes a hardware trap as part of booting at a
known (bad) faulting address.  This then allows us to be sure that we know
how to interpret the signal handling register save structures. 

It appears that this code is only enabled on AIX.  If I were working on
PPC/Linux (32 or 64) I would enable this code and make sure that it works.
Much easier to debug a known segfault invoked from the C code of the
bootimage runner than trying to deal with a null pointer exception from the
Java code.\ 

Look for getFaultingAddress in libvm.c 

--dave

RE: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux

From: Eliot M. <mo...@cs...> - 2004-04-01 21:01:01

>>>>> "Sergiy" == Sergiy Kyrylkov <se...@cs...> writes:

    Sergiy> The fact that Perry's code for faulting address works also
    Sergiy> suggests that this problem appears only with SIGSEGV associated
    Sergiy> with negative addresses which is quite interesting.
 
Well, my recollection is that the faulting address occurs in more than one
place in the data strcutres. Perhaps it is reliably filled in in one place,
and not always done in the other, and we're reading the "wrong" one?

Anyway, sounds like time to sit down with gdb, poke around in the
structure, and see if the real address is there, only somewhere else!

-- Eliot

RE: [Jikesrvm-core] cTrapHandler does not work for null pointer dereference on PPC64 Linux

From: Sergiy K. <se...@cs...> - 2004-04-01 07:15:35

At a very high level siginfo.h is the same for both ppc and ppc64.  I will
have to look deeper in Linux kernel sources.

Sergiy

> -----Original Message-----
> From: jik...@os... 
> [mailto:jik...@os...] On Behalf 
> Of Eliot Moss
> Sent: Wednesday, March 31, 2004 8:37 PM
> To: jik...@os...
> Subject: [Jikesrvm-core] cTrapHandler does not work for null 
> pointer dereference on PPC64 Linux
> 
> What I'm ondering, Sergiy, is how a null pointer dereference 
> results in any address other than a negative one (i.e., one 
> in very high memory). The only accesses that should cause 
> such a trap are to object headers or scalar fields (arrays 
> will access the length field in the header first, at the 
> least), all of which are at negative offset from the pointer 
> -- unless you've done something odd to the object model. 
> Perhaps the trap handler is not getting the actual faulting 
> address or something? It would not surprise me if we don't 
> have the fault information right for PPC 64 Linux. The 
> offsets would tend to be different from those for PPC 32 Linux, etc.
> 
> -- Eliot
> _______________________________________________
> Jikesrvm-core mailing list
> Jik...@os...
> http://www-124.ibm.com/developerworks/oss/mailman/listinfo/jik
> esrvm-core