|
From: Richard Frith-M. <ri...@br...> - 2006-01-16 07:13:05
|
I have a large body of code making complex use of JNI called from servlets in tomcat. Because this uses a lot of JNI, and was developed on the Sun implementation of java, I have been unable to find another implementation which will run it ... SableVM and Kaffe lack a lot of the JNI functions required. I have infrequent crashes in Java which are almost certainly due to errors in the JNI code ... most likely C library code writing to memory it shouldn't, and corrupting the Java system ... so valgrind seems the only realistic hope to catch that kind of thing. Unfortunately, valgrind doesn't seem to work with Sun's Java implementation. After much searching the 'resolved' bug #69508 seems to describe what is happening ... the last two comments in the bug report talk about Java re-execing to put its own directory at the front of LD_LIBRARY_PATH and valgrind doing the same ... getting into an infinite loop. I guess the bug being marked as 'resolved' refers to the original problem (a 'stack size too small' report) ... but it doesn't actually explain how to get valgrind/java to work together. has this actually been done? Is there a known workaround? I guess it ought to be possible to get valgrind to intercept the system calls that java uses and fool it into thinking its directory is the first in LD_LIBRARY_PATH? Or perhaps do some clever mixing/matching of java and valgrind libraries putting them both in the same directory I don't know enough about dynamic library loading to know whether that's really possible? I'd really appreciate some help on this. |
|
From: Tom H. <to...@co...> - 2006-01-16 08:55:33
|
In message <ABD...@br...>
Richard Frith-Macdonald <ri...@br...> wrote:
> Unfortunately, valgrind doesn't seem to work with Sun's Java
> implementation. After much searching the 'resolved' bug #69508 seems
> to describe what is happening ... the last two comments in the bug
> report talk about Java re-execing to put its own directory at the
> front of LD_LIBRARY_PATH and valgrind doing the same ... getting into
> an infinite loop.
Possibly - if you use --trace-children=yes do you see a whole stream
of valgrind startup messages appearing?
> I guess the bug being marked as 'resolved' refers to the original
> problem (a 'stack size too small' report) ... but it doesn't actually
> explain how to get valgrind/java to work together. has this actually
> been done? Is there a known workaround?
I'm not sure when the bug was resolved, but we no longer change
the value of LD_LIBRARY_PATH as it was only ever needed to allow
us to replace libpthread with our own one and we haven't done that
since the 2.2.x releases.
I think there was a delay in removing the LD_LIBRARY_PATH code, but
the 3.x releases should all be fine - none of them will try and
change it.
What version are you using?
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Richard Frith-M. <ri...@br...> - 2006-01-16 10:42:19
|
On 16 Jan 2006, at 08:55, Tom Hughes wrote: > In message <ABD...@br...> > Richard Frith-Macdonald <ri...@br...> wrote: > >> Unfortunately, valgrind doesn't seem to work with Sun's Java >> implementation. After much searching the 'resolved' bug #69508 seems >> to describe what is happening ... the last two comments in the bug >> report talk about Java re-execing to put its own directory at the >> front of LD_LIBRARY_PATH and valgrind doing the same ... getting into >> an infinite loop. > > Possibly - if you use --trace-children=yes do you see a whole stream > of valgrind startup messages appearing? No ... I guess I misdiagnosed the problem ... what I actually got was 1. a valgrind log file saying that there were too many errors and logging had stopped. 2. a system where the tomcat log was empty, and attempts to connect to it by apache were failing. This made me assume it was looping, as starting tomcat normally (ie without valgrind) results in a working tomcat setup visible in my web browser. >> I guess the bug being marked as 'resolved' refers to the original >> problem (a 'stack size too small' report) ... but it doesn't actually >> explain how to get valgrind/java to work together. has this actually >> been done? Is there a known workaround? > > I'm not sure when the bug was resolved, but we no longer change > the value of LD_LIBRARY_PATH as it was only ever needed to allow > us to replace libpthread with our own one and we haven't done that > since the 2.2.x releases. > > I think there was a delay in removing the LD_LIBRARY_PATH code, but > the 3.x releases should all be fine - none of them will try and > change it. > > What version are you using? Valgrind 3.1.0 built from source, Java jdk1.5.0_06 from Sun, Tomcat 4.1.29 on a Debian AMD64 'sid' system. I just tried adding '--error-limit=no' to the command line to start tomcat. It generates a bigger log file ... but the log file has stopped growing, though the valgrind/tomcat process looks like it is in a loop of some sort (ie a 'ps' shows it in an 'R' state). The log file stopped growing at 10:13, but it's now 10:40 and 'ps' shows the the process has now used over 28 minutes of CPU time. I don't know if it's any use at all, but the end of the log (the whole log is nearly 300KB ... too big to post to a list) looks like this ... ==7476== Invalid write of size 4 ==7476== at 0x70E3E710: ??? ==7476== by 0x70DD5EFD: ??? ==7476== by 0x70DD5EFD: ??? ==7476== by 0x70DD5DE0: ??? ==7476== by 0x70DD5EFD: ??? ==7476== by 0x70DD5EFD: ??? ==7476== by 0x70DD5DE0: ??? ==7476== by 0x70DD332C: ??? ==7476== by 0x4BF2BA4: JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*) (in /usr/local/ jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==7476== by 0x4DC8EB8: os::os_exception_wrapper(void (*) (JavaValue*, methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, JavaCallArguments*, Thread*) (in /usr/local/ jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==7476== by 0x4BF29B4: JavaCalls::call(JavaValue*, methodHandle, JavaCallArguments*, Thread*) (in /usr/local/jdk1.5.0_06/jre/lib/amd64/ server/libjvm.so) ==7476== by 0x4C1FF54: jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*) (in /usr/local/jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==7476== Address 0x7FEFF7B88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70E332F0: ??? ==7476== by 0x70DD5EFD: ??? ==7476== by 0x70DD5EFD: ??? ==7476== by 0x70DD5DE0: ??? ==7476== by 0x70DD5EFD: ??? ==7476== by 0x70DD5EFD: ??? ==7476== by 0x70DD5DE0: ??? ==7476== by 0x70DD332C: ??? ==7476== by 0x4BF2BA4: JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*) (in /usr/local/ jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==7476== by 0x4DC8EB8: os::os_exception_wrapper(void (*) (JavaValue*, methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, JavaCallArguments*, Thread*) (in /usr/local/ jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==7476== by 0x4BF29B4: JavaCalls::call(JavaValue*, methodHandle, JavaCallArguments*, Thread*) (in /usr/local/jdk1.5.0_06/jre/lib/amd64/ server/libjvm.so) ==7476== by 0x4C1FF54: jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*) (in /usr/local/jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==7476== Address 0x7FEFF7B08 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70E3574D: ??? ==7476== Address 0x7FEFF7B08 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC49: ??? ==7476== by 0x70E3577F: ??? ==7476== Address 0x7FEFFCA88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC50: ??? ==7476== by 0x70E3577F: ??? ==7476== Address 0x7FEFFBA88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC57: ??? ==7476== by 0x70E3577F: ??? ==7476== Address 0x7FEFFAA88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC5E: ??? ==7476== by 0x70E3577F: ??? ==7476== Address 0x7FEFF9A88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC65: ??? ==7476== by 0x70E3577F: ??? ==7476== Address 0x7FEFF8A88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC6C: ??? ==7476== by 0x70E3577F: ??? ==7476== Address 0x7FEFF7A88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70E354CD: ??? ==7476== Address 0x7FEFF7B08 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC49: ??? ==7476== by 0x70E35503: ??? ==7476== Address 0x7FEFFCA60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC50: ??? ==7476== by 0x70E35503: ??? ==7476== Address 0x7FEFFBA60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC57: ??? ==7476== by 0x70E35503: ??? ==7476== Address 0x7FEFFAA60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC5E: ??? ==7476== by 0x70E35503: ??? ==7476== Address 0x7FEFF9A60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC65: ??? ==7476== by 0x70E35503: ??? ==7476== Address 0x7FEFF8A60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC6C: ??? ==7476== by 0x70E35503: ??? ==7476== Address 0x7FEFF7A60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70E3524D: ??? ==7476== Address 0x7FEFF7B08 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC49: ??? ==7476== by 0x70E3527B: ??? ==7476== Address 0x7FEFFCA88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC50: ??? ==7476== by 0x70E3527B: ??? ==7476== Address 0x7FEFFBA88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC57: ??? ==7476== by 0x70E3527B: ??? ==7476== Address 0x7FEFFAA88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC5E: ??? ==7476== by 0x70E3527B: ??? ==7476== Address 0x7FEFF9A88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC65: ??? ==7476== by 0x70E3527B: ??? ==7476== Address 0x7FEFF8A88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC6C: ??? ==7476== by 0x70E3527B: ??? ==7476== Address 0x7FEFF7A88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70E31840: ??? ==7476== by 0x70E3527B: ??? ==7476== Address 0x7FEFF7A78 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70E31F80: ??? ==7476== by 0x70E3527B: ??? ==7476== Address 0x7FEFF7A48 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70E33B4D: ??? ==7476== Address 0x7FEFF7B08 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC49: ??? ==7476== by 0x70E33B7F: ??? ==7476== Address 0x7FEFFCA60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC50: ??? ==7476== by 0x70E33B7F: ??? ==7476== Address 0x7FEFFBA60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC57: ??? ==7476== by 0x70E33B7F: ??? ==7476== Address 0x7FEFFAA60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC5E: ??? ==7476== by 0x70E33B7F: ??? ==7476== Address 0x7FEFF9A60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC65: ??? ==7476== by 0x70E33B7F: ??? ==7476== Address 0x7FEFF8A60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC6C: ??? ==7476== by 0x70E33B7F: ??? ==7476== Address 0x7FEFF7A60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid read of size 8 ==7476== at 0x70E3F260: ??? ==7476== Address 0x8 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Conditional jump or move depends on uninitialised value(s) ==7476== at 0x4E31AEB: SharedRuntime::continuation_for_implicit_exception(JavaThread*, unsigned char*, SharedRuntime::ImplicitExceptionKind) (in /usr/local/ jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==7476== by 0x4DCA625: JVM_handle_linux_signal (in /usr/local/ jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==7476== by 0x4DC812D: signalHandler(int, siginfo*, void*) (in / usr/local/jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==7476== by 0x432B79F: (within /lib/libpthread-2.3.5.so) |
|
From: Tom H. <to...@co...> - 2006-01-16 10:53:02
|
In message <88F...@br...>
Richard Frith-Macdonald <ri...@br...> wrote:
> On 16 Jan 2006, at 08:55, Tom Hughes wrote:
>
>> Possibly - if you use --trace-children=yes do you see a whole stream
>> of valgrind startup messages appearing?
>
> No ... I guess I misdiagnosed the problem ... what I actually got was
> 1. a valgrind log file saying that there were too many errors and
> logging had stopped.
> 2. a system where the tomcat log was empty, and attempts to connect
> to it by apache were failing.
> This made me assume it was looping, as starting tomcat normally (ie
> without valgrind) results in a working tomcat setup visible in my web
> browser.
The LD_LIBRARY_PATH loop means that Java is not really running much
at all so I wouldn't expect to see much in the way of valgrind errors
reports in that case.
> I just tried adding '--error-limit=no' to the command line to start
> tomcat. It generates a bigger log file ... but the log file has
> stopped growing, though the valgrind/tomcat process looks like it is
> in a loop of some sort (ie a 'ps' shows it in an 'R' state).
> The log file stopped growing at 10:13, but it's now 10:40 and 'ps'
> shows the the process has now used over 28 minutes of CPU time.
Well what does strace say it is doing? What about --trace-syscalls=yes?
At end of the day my experience of trying to look at these problems
in the past is that the Sun JVM does truly disgusting things that are
often very hard to support in valgrind. It tends to stress just about
everything to the absolute limit especially things like threading and
signal handling.
My favourite one was where it seemed to be fiddling with a memory
page (either munmap or mprotect, I can't remember which) which it
didn't seem to have allocated... It seemed to just be making
assumptions about where certain things would be in memory for some
reason.
I assume you are using --smc-check=all or have turned the JIT compiler
off?
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Richard Frith-M. <ri...@br...> - 2006-01-16 11:18:38
|
On 16 Jan 2006, at 10:52, Tom Hughes wrote: > In message <88F...@br...> > Richard Frith-Macdonald <ri...@br...> wrote: > >> I just tried adding '--error-limit=no' to the command line to start >> tomcat. It generates a bigger log file ... but the log file has >> stopped growing, though the valgrind/tomcat process looks like it is >> in a loop of some sort (ie a 'ps' shows it in an 'R' state). >> The log file stopped growing at 10:13, but it's now 10:40 and 'ps' >> shows the the process has now used over 28 minutes of CPU time. > > Well what does strace say it is doing? What about --trace- > syscalls=yes? I hadn't thought of that ... looks like a signal handling problem of some sort ... after adding '--trace-syscalls=yes' I get 'SYSCALL[8149,1]( 15) rt_sigreturn ( ) --> [pre-success] Success(0x0)' repeated indefinitely at the end of the log file. The log file to that point is about 900KB. > At end of the day my experience of trying to look at these problems > in the past is that the Sun JVM does truly disgusting things that are > often very hard to support in valgrind. It tends to stress just about > everything to the absolute limit especially things like threading and > signal handling. I suppose threading/signal handling are understandable given the nature of the beast. Sounds like you don't have much hope of it working though:-( > My favourite one was where it seemed to be fiddling with a memory > page (either munmap or mprotect, I can't remember which) which it > didn't seem to have allocated... It seemed to just be making > assumptions about where certain things would be in memory for some > reason. > > I assume you are using --smc-check=all or have turned the JIT compiler > off? Yes ... '/usr/local/bin/valgrind -v --trace-children=yes --smc- check=all --trace-syscalls=yes --error-limit=no --logfile=/tmp/vg' followed by the normal tomcat startup command. |
|
From: Tom H. <to...@co...> - 2006-01-16 11:44:45
|
In message <BBC...@br...>
Richard Frith-Macdonald <ri...@br...> wrote:
> On 16 Jan 2006, at 10:52, Tom Hughes wrote:
>
>> In message <88F...@br...>
>> Richard Frith-Macdonald <ri...@br...> wrote:
>>
>>> I just tried adding '--error-limit=no' to the command line to start
>>> tomcat. It generates a bigger log file ... but the log file has
>>> stopped growing, though the valgrind/tomcat process looks like it is
>>> in a loop of some sort (ie a 'ps' shows it in an 'R' state).
>>> The log file stopped growing at 10:13, but it's now 10:40 and 'ps'
>>> shows the the process has now used over 28 minutes of CPU time.
>>
>> Well what does strace say it is doing? What about --trace-
>> syscalls=yes?
>
> I hadn't thought of that ... looks like a signal handling problem of
> some sort ... after adding '--trace-syscalls=yes' I get
> 'SYSCALL[8149,1]( 15) rt_sigreturn ( ) --> [pre-success] Success(0x0)'
> repeated indefinitely at the end of the log file. The log file to
> that point is about 900KB.
Looks like a signal handling issue then - add --trace-signals=yes
and see what that gets you.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: David E. <tw...@us...> - 2006-01-16 10:14:57
|
On Mon, 2006-01-16 at 07:12 +0000, Richard Frith-Macdonald wrote:
> I have a large body of code making complex use of JNI called from
> servlets in tomcat.
>
> Because this uses a lot of JNI, and was developed on the Sun
> implementation of java, I have been unable to find another
> implementation which will run it ... SableVM and Kaffe lack a lot of
> the JNI functions required.
>
> I have infrequent crashes in Java which are almost certainly due to
> errors in the JNI code ... most likely C library code writing to
> memory it shouldn't, and corrupting the Java system ... so valgrind
> seems the only realistic hope to catch that kind of thing.
Just thinking out loud here... (It was a long time since I did JNI.)
Would it be possible to create unit tests or similar that exerciced the
JNI code without any Java involved? Then valgrind could run on the unit
tests...
--
Regards,
-\- David Eriksson -/-
SynCE - http://synce.sourceforge.net
ScummVM - http://scummvm.sourceforge.net
Desquirr - http://desquirr.sourceforge.net
|
|
From: Richard Frith-M. <ri...@br...> - 2006-01-16 10:51:27
|
On 16 Jan 2006, at 10:14, David Eriksson wrote: > On Mon, 2006-01-16 at 07:12 +0000, Richard Frith-Macdonald wrote: >> I have a large body of code making complex use of JNI called from >> servlets in tomcat. >> >> Because this uses a lot of JNI, and was developed on the Sun >> implementation of java, I have been unable to find another >> implementation which will run it ... SableVM and Kaffe lack a lot of >> the JNI functions required. >> >> I have infrequent crashes in Java which are almost certainly due to >> errors in the JNI code ... most likely C library code writing to >> memory it shouldn't, and corrupting the Java system ... so valgrind >> seems the only realistic hope to catch that kind of thing. > > Just thinking out loud here... (It was a long time since I did JNI.) > > Would it be possible to create unit tests or similar that exerciced > the > JNI code without any Java involved? Then valgrind could run on the > unit > tests... Thanks for the suggestion, but the code is large,complex and involves callbacks into the java environment from the external libraries. Any non-java unit tests would need to simulate the presence of java and be hugely time consuming to write. It would probably be quicker to re-implement the system entirely in C/Objective-C. |
|
From: Richard Frith-M. <ri...@br...> - 2006-01-16 12:27:16
|
On 16 Jan 2006, at 11:44, Tom Hughes wrote: > In message <BBC...@br...> > Richard Frith-Macdonald <ri...@br...> wrote: > >> On 16 Jan 2006, at 10:52, Tom Hughes wrote: >> >>> In message <88F...@br...> >>> Richard Frith-Macdonald <ri...@br...> wrote: >>> >>>> I just tried adding '--error-limit=no' to the command line to start >>>> tomcat. It generates a bigger log file ... but the log file has >>>> stopped growing, though the valgrind/tomcat process looks like >>>> it is >>>> in a loop of some sort (ie a 'ps' shows it in an 'R' state). >>>> The log file stopped growing at 10:13, but it's now 10:40 and 'ps' >>>> shows the the process has now used over 28 minutes of CPU time. >>> >>> Well what does strace say it is doing? What about --trace- >>> syscalls=yes? >> >> I hadn't thought of that ... looks like a signal handling problem of >> some sort ... after adding '--trace-syscalls=yes' I get >> 'SYSCALL[8149,1]( 15) rt_sigreturn ( ) --> [pre-success] Success >> (0x0)' >> repeated indefinitely at the end of the log file. The log file to >> that point is about 900KB. > > Looks like a signal handling issue then - add --trace-signals=yes > and see what that gets you. OK ... segmentation violation ... but I don't know where/why. Does the following (the last six lines are the ones that repeat forever) make any sense? SYSCALL[29972,1](202) sys_futex ( 0x141E5244, 1, 1, 0x508CB10, 0x141E5240 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x141E5240, 0, 2, 0x0, 0x141E4200 ) --> [async] ... SYSCALL[29972,1](202) ... [async] --> Success(0x1) SYSCALL[29972,1](202) sys_futex ( 0x141E5240, 1, 1, 0x508CB10, 0x141E5240 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x141E5240, 1, 1, 0x0, 0x141E4200 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x477FFE0, 0, 2, 0x0, 0x477FFE0 ) -- > [async] ... SYSCALL[29972,1](202) ... [async] --> Success(0x1) SYSCALL[29972,1](202) sys_futex ( 0x477FFE0, 1, 1, 0x508CB10, 0x141E5240 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x477FFE0, 1, 1, 0x50C6070, 0x477FFE0 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6]( 96) sys_gettimeofday ( 0x147C6010, 0x0 )[sync] --> Success(0x0) SYSCALL[29972,6]( 96) sys_gettimeofday ( 0x147C6010, 0x0 )[sync] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x141E5244, 0, 9, 0x0, 0x141E4200 ) --> [async] ... SYSCALL[29972,1](202) ... [async] --> Success(0x1) ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70E354CD: ??? ==29972== Address 0x7FEFF7B08 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC49: ??? ==29972== by 0x70E35503: ??? ==29972== Address 0x7FEFFCA60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC50: ??? ==29972== by 0x70E35503: ??? ==29972== Address 0x7FEFFBA60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC57: ??? ==29972== by 0x70E35503: ??? ==29972== Address 0x7FEFFAA60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC5E: ??? ==29972== by 0x70E35503: ??? ==29972== Address 0x7FEFF9A60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC65: ??? ==29972== by 0x70E35503: ??? ==29972== Address 0x7FEFF8A60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC6C: ??? ==29972== by 0x70E35503: ??? ==29972== Address 0x7FEFF7A60 is not stack'd, malloc'd or (recently) free'd SYSCALL[29972,1](202) sys_futex ( 0x141E5244, 1, 1, 0x508CB10, 0x141E5240 ) --> [async] ... SYSCALL[29972,10](202) ... [async] --> Failure(0x6E) SYSCALL[29972,10](202) sys_futex ( 0x14BC9E78, 1, 1, 0x14BC9DD8, 0x14BC9E78 ) --> [async] ... SYSCALL[29972,10](202) ... [async] --> Success(0x0) SYSCALL[29972,10]( 96) sys_gettimeofday ( 0x14BC9E90, 0x0 )[sync] --> Success(0x0) SYSCALL[29972,10]( 96) sys_gettimeofday ( 0x14BC9E00, 0x0 )[sync] --> Success(0x0) SYSCALL[29972,10](228) sys_clock_gettime( 0, 0x14BC9DD8 )[sync] --> Success(0x0) SYSCALL[29972,10](202) sys_futex ( 0x15415C64, 0, 1, 0x14BC9DD8, 0x14BC9E78 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x141E5240, 0, 2, 0x0, 0x141E4200 ) --> [async] ... SYSCALL[29972,1](202) ... [async] --> Success(0x1) SYSCALL[29972,1](202) sys_futex ( 0x141E5240, 1, 1, 0x508CB10, 0x141E5240 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x141E5240, 1, 1, 0x0, 0x141E4200 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x477FFE0, 0, 2, 0x0, 0x477FFE0 ) -- > [async] ... SYSCALL[29972,1](202) ... [async] --> Success(0x1) SYSCALL[29972,1](202) sys_futex ( 0x477FFE0, 1, 1, 0x508CB10, 0x141E5240 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x477FFE0, 1, 1, 0x50C6070, 0x477FFE0 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6]( 96) sys_gettimeofday ( 0x147C6010, 0x0 )[sync] --> Success(0x0) SYSCALL[29972,6]( 96) sys_gettimeofday ( 0x147C6010, 0x0 )[sync] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x141E5244, 0, 11, 0x0, 0x141E4200 ) --> [async] ... SYSCALL[29972,1](202) ... [async] --> Success(0x1) ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70E3524D: ??? ==29972== Address 0x7FEFF7B08 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC49: ??? ==29972== by 0x70E3527B: ??? ==29972== Address 0x7FEFFCA88 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC50: ??? ==29972== by 0x70E3527B: ??? ==29972== Address 0x7FEFFBA88 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC57: ??? ==29972== by 0x70E3527B: ??? ==29972== Address 0x7FEFFAA88 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC5E: ??? ==29972== by 0x70E3527B: ??? ==29972== Address 0x7FEFF9A88 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC65: ??? ==29972== by 0x70E3527B: ??? ==29972== Address 0x7FEFF8A88 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC6C: ??? ==29972== by 0x70E3527B: ??? ==29972== Address 0x7FEFF7A88 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70E31840: ??? ==29972== by 0x70E3527B: ??? ==29972== Address 0x7FEFF7A78 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70E31F80: ??? ==29972== by 0x70E3527B: ??? ==29972== Address 0x7FEFF7A48 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70E33B4D: ??? ==29972== Address 0x7FEFF7B08 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC49: ??? ==29972== by 0x70E33B7F: ??? ==29972== Address 0x7FEFFCA60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC50: ??? ==29972== by 0x70E33B7F: ??? ==29972== Address 0x7FEFFBA60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC57: ??? ==29972== by 0x70E33B7F: ??? ==29972== Address 0x7FEFFAA60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC5E: ??? ==29972== by 0x70E33B7F: ??? ==29972== Address 0x7FEFF9A60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC65: ??? ==29972== by 0x70E33B7F: ??? ==29972== Address 0x7FEFF8A60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC6C: ??? ==29972== by 0x70E33B7F: ??? ==29972== Address 0x7FEFF7A60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid read of size 8 ==29972== at 0x70E3F260: ??? ==29972== Address 0x8 is not stack'd, malloc'd or (recently) free'd --29972-- signal 11 arrived ... si_code=1, EIP=0x70E3F260, eip=0x405C4F34B --29972-- SIGSEGV: si_code=1 faultaddr=0x8 tid=1 ESP=0x7FEFFDB10 seg=0x0-0x3FFFFFF --29972-- delivering signal 11 (SIGSEGV):1 to thread 1 --29972-- push_signal_frame (thread 1): signal 11 ==29972== ==29972== Conditional jump or move depends on uninitialised value(s) ==29972== at 0x4E31AEB: SharedRuntime::continuation_for_implicit_exception(JavaThread*, unsigned char*, SharedRuntime::ImplicitExceptionKind) (in /usr/local/ jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==29972== by 0x4DCA625: JVM_handle_linux_signal (in /usr/local/ jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==29972== by 0x4DC812D: signalHandler(int, siginfo*, void*) (in / usr/local/jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==29972== by 0x432B79F: (within /lib/libpthread-2.3.5.so) SYSCALL[29972,1]( 15) rt_sigreturn ( )--29972-- VG_(signal_return) (thread 1): isRT=1 valid magic; RIP=0x70E3F260 --> [pre-success] Success(0x0) --29972-- signal 11 arrived ... si_code=1, EIP=0x70E3F260, eip=0x405C4F34B --29972-- SIGSEGV: si_code=1 faultaddr=0x8 tid=1 ESP=0x7FEFFDB10 seg=0x0-0x3FFFFFF --29972-- delivering signal 11 (SIGSEGV):1 to thread 1 --29972-- push_signal_frame (thread 1): signal 11 SYSCALL[29972,1]( 15) rt_sigreturn ( )--29972-- VG_(signal_return) (thread 1): isRT=1 valid magic; RIP=0x70E3F260 --> [pre-success] Success(0x0) --29972-- signal 11 arrived ... si_code=1, EIP=0x70E3F260, eip=0x405C4F34B --29972-- SIGSEGV: si_code=1 faultaddr=0x8 tid=1 ESP=0x7FEFFDB10 seg=0x0-0x3FFFFFF --29972-- delivering signal 11 (SIGSEGV):1 to thread 1 --29972-- push_signal_frame (thread 1): signal 11 SYSCALL[29972,10](202) ... [async] --> Failure(0x6E) SYSCALL[29972,10](202) sys_futex ( 0x14BC9E78, 1, 1, 0x14BC9DD8, 0x14BC9E78 ) --> [async] ... SYSCALL[29972,10](202) ... [async] --> Success(0x0) SYSCALL[29972,10]( 96) sys_gettimeofday ( 0x14BC9E90, 0x0 )[sync] --> Success(0x0) SYSCALL[29972,10]( 96) sys_gettimeofday ( 0x14BC9E00, 0x0 )[sync] --> Success(0x0) SYSCALL[29972,10](228) sys_clock_gettime( 0, 0x14BC9DD8 )[sync] --> Success(0x0) SYSCALL[29972,10](202) sys_futex ( 0x48071D4, 0, 1, 0x14BC9DD8, 0x14BC9E78 ) --> [async] ... SYSCALL[29972,1]( 15) rt_sigreturn ( )--29972-- VG_(signal_return) (thread 1): isRT=1 valid magic; RIP=0x70E3F260 --> [pre-success] Success(0x0) --29972-- signal 11 arrived ... si_code=1, EIP=0x70E3F260, eip=0x405C4F34B --29972-- SIGSEGV: si_code=1 faultaddr=0x8 tid=1 ESP=0x7FEFFDB10 seg=0x0-0x3FFFFFF --29972-- delivering signal 11 (SIGSEGV):1 to thread 1 --29972-- push_signal_frame (thread 1): signal 11 |
|
From: Tom H. <to...@co...> - 2006-01-16 16:47:42
|
In message <861...@br...>
Richard Frith-Macdonald <ri...@br...> wrote:
> On 16 Jan 2006, at 11:44, Tom Hughes wrote:
>
>> Looks like a signal handling issue then - add --trace-signals=yes
>> and see what that gets you.
>
> OK ... segmentation violation ... but I don't know where/why.
> Does the following (the last six lines are the ones that repeat
> forever) make any sense?
OK. I can reproduce this. I think the problem is that the JVM is
installing a SEGV handler which tries to resume at a different
address by updating the value of RIP in the signal context.
Now valgrind doesn't handle that - it copies most of the register
values back but not the RIP value. I've just tried making it copy
the RIP value back but it didn't seem to help and it was still
looping.
I presume the reason for that is that although the RIP for the
simulated CPU has been updated when the signal handler returns
the real CPU will restart execution from in the JITed code that
valgrind has generated and then fault again.
The signal handler probably needs to jump back to the scheduler
if RIP has changed so that the scheduler can restart execution
from the right address but Julian may have a better idea how to
fix this...
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Richard Frith-M. <ri...@br...> - 2006-01-16 17:01:11
|
On 16 Jan 2006, at 16:47, Tom Hughes wrote: > In message <861...@br...> > Richard Frith-Macdonald <ri...@br...> wrote: > >> On 16 Jan 2006, at 11:44, Tom Hughes wrote: >> >>> Looks like a signal handling issue then - add --trace-signals=yes >>> and see what that gets you. >> >> OK ... segmentation violation ... but I don't know where/why. >> Does the following (the last six lines are the ones that repeat >> forever) make any sense? > > OK. I can reproduce this. I think the problem is that the JVM is > installing a SEGV handler which tries to resume at a different > address by updating the value of RIP in the signal context. > > Now valgrind doesn't handle that - it copies most of the register > values back but not the RIP value. I've just tried making it copy > the RIP value back but it didn't seem to help and it was still > looping. > > I presume the reason for that is that although the RIP for the > simulated CPU has been updated when the signal handler returns > the real CPU will restart execution from in the JITed code that > valgrind has generated and then fault again. > > The signal handler probably needs to jump back to the scheduler > if RIP has changed so that the scheduler can restart execution > from the right address but Julian may have a better idea how to > fix this... That's getting into the internals of valgrind and beyond my current knowhow ... but I understand the gist of your reply. I guess this is to be taken as a valgrind bug then ... would you like me to file a bug report? |
|
From: Julian S. <js...@ac...> - 2006-01-17 15:11:40
|
> OK. I can reproduce this. I think the problem is that the JVM is > installing a SEGV handler which tries to resume at a different > address by updating the value of RIP in the signal context. Gaah. How horrible. > Now valgrind doesn't handle that - it copies most of the register > values back but not the RIP value. I've just tried making it copy > the RIP value back but it didn't seem to help and it was still > looping. > > I presume the reason for that is that although the RIP for the > simulated CPU has been updated when the signal handler returns > the real CPU will restart execution from in the JITed code that > valgrind has generated and then fault again. This is based on inadequate (zero) investigation on my part, so could be completely bogus, but how I think it works is: - vex-generated code gets a segfault - Valgrind's signal handler runs. It sees this is a synch signal and longjmps back to the scheduler. - scheduler sees a signal is pending for this thread, builds frame for thread, sets RIP to enter handler - thread is rescheduled, handler runs (on sim'd cpu of course). - handler messes with RIP in sigcontext. - handler returns. This makes it jump to the stub code in the frame created by m_sigframe, which in turn causes it to do a sys_rt_sigreturn. - So we're now in PRE(sys_rt_sigreturn) in syswap-amd64-linux.c. That does some incomprehensible messing around (I never really understood it) but it does call VG_(sigframe_destroy)(tid, True). This copies the sigcontext back into the thread state. - When the thread is rescheduled, it will continue at the new RIP value. So it should work (ha ha ha). I'm sure Jeremy had it working, or something very similar, so that he could do self-hosting on 2.4.X, but that did get partially broken at the time the 3.0 code line was created, and now we don't rely on signal cleverness to do self hosting. One thing that does occur to me is that by default the simulated registers may not be up to date at the exception point, and that could cause problems. What happens if you run with --vex-iropt-precise-memory-exns=yes ? J |
|
From: Richard Frith-M. <ri...@br...> - 2006-01-17 15:49:58
|
On 17 Jan 2006, at 15:11, Julian Seward wrote: > One thing that does occur to me is that by default the simulated > registers may not be up to date at the exception point, and that > could cause problems. What happens if you run with > --vex-iropt-precise-memory-exns=yes ? No difference ... it ends up looping handling a sigsegv |
|
From: Dalibor T. <ro...@ka...> - 2006-01-18 03:41:11
|
Richard Frith-Macdonald <richard <at> brainstorm.co.uk> writes: > > I have a large body of code making complex use of JNI called from > servlets in tomcat. > > Because this uses a lot of JNI, and was developed on the Sun > implementation of java, I have been unable to find another > implementation which will run it ... SableVM and Kaffe lack a lot of > the JNI functions required. Have you tried with Kaffe 1.1.6 or 1.1.7-rc1? That version should implement all of the JNI 1.4 APIs, as far as I remember (don't have the list handy atm). Kaffe also works under velgrind, try KAFFE_DEBUG=valgrind kaffe yourClass. cheers, dalibor topic |
|
From: Richard Frith-M. <ri...@br...> - 2006-01-18 17:11:28
|
On 18 Jan 2006, at 03:29, Dalibor Topic wrote: > Richard Frith-Macdonald <richard <at> brainstorm.co.uk> writes: > >> >> I have a large body of code making complex use of JNI called from >> servlets in tomcat. >> >> Because this uses a lot of JNI, and was developed on the Sun >> implementation of java, I have been unable to find another >> implementation which will run it ... SableVM and Kaffe lack a lot of >> the JNI functions required. > > Have you tried with Kaffe 1.1.6 or 1.1.7-rc1? That version should > implement all > of the JNI 1.4 APIs, as far as I remember (don't have the list > handy atm). Kaffe > also works under velgrind, try KAFFE_DEBUG=valgrind kaffe yourClass. I haven't managed to get things to work with Kaffe ... at first I assumed it was a JNI problem, but I reduced things to a very simple test program and it wasn't even getting as far as loading the shared library but was actually failing to locate the java class which used it ... despite having CLASSPATH set up to point to it (and trying the -classpath command line argument). I've been looking at lots of kaffe debug output, and it doesn't seem to be initialising the internal classpath correctly (the latest debian unstable package for the amd64), and my current best guess is that kaffe has bugs in 64bit mode. The only way I'm likely to find out for sure is to build a version from source and run under gdb to see exactly what's going on when it starts up :-( |
|
From: Dalibor T. <ro...@ka...> - 2006-01-19 03:10:57
|
Richard Frith-Macdonald <richard <at> brainstorm.co.uk> writes: > I haven't managed to get things to work with Kaffe ... I am sorry to hear that. > The only way I'm likely to find out for sure is to build a > version from source and run under gdb to see exactly what's going on > when it starts up Yeah. You may want to chose jikes for the java compiler, since ecj crashes building 1.1.6-91 (i.e 1.1.7-rc1) on amd64 debian, according to the buildd logs from today :/ cheers, dalibor topic |