From: dreedy <sig...@hy...> - 2009-04-18 04:34:34
|
Hi, I am running into issues on both Linux (ubuntu 2.6.27) and OSX (10.5.6) using Java 6. The result is the JVM crashes. I have attached 2 files: 1. hs_err_pid24271.log : The crash file produced on Linux 2. java_2009-04-17-082543_CaneBay.crash : The crash file generated on OSX Both seem to point to the same problem: j org.hyperic.sigar.Cpu.gather(Lorg/hyperic/sigar/Sigar;)V+0 j org.hyperic.sigar.Cpu.fetch(Lorg/hyperic/sigar/Sigar;)Lorg/hyperic/sigar/Cpu;+10 j org.hyperic.sigar.Sigar.getCpu()Lorg/hyperic/sigar/Cpu;+1 j org.hyperic.sigar.Sigar.getCpuPerc()Lorg/hyperic/sigar/CpuPerc;+8 Info on the machines that produced the issue: OSX Machine ------------------- Mac Pro, Quad-Core Intel Xeon, 8 cores > uname -a Darwin CaneBay.local 9.6.3 Darwin Kernel Version 9.6.3: Tue Jan 20 18:26:40 PST 2009; root:xnu-1228.10.33~1/RELEASE_I386 i386 > java -version java version "1.6.0_07" Java(TM) SE Runtime Environment (build 1.6.0_07-b06-153) Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_07-b06-57, mixed mode) Linux Machine ------------------- $ uname -a Linux mahobay 2.6.27-11-generic #1 SMP Thu Jan 29 19:24:39 UTC 2009 i686 GNU/Linux $ java -version java version "1.6.0_11" Java(TM) SE Runtime Environment (build 1.6.0_11-b03) Java HotSpot(TM) Client VM (build 11.0-b16, mixed mode, sharing) Hoping you can let me know if there is a workaround (aside from removing SIGAR from the runtime). Please let me know if I can provide or assist with any more information. If it would be easier to work this offline, please let me know that as well. Regards Dennis |
From: Doug M. <do...@hy...> - 2009-04-18 15:29:36
|
Hi Dennis, Are you able to reproduce the crash on these machines with the following: % java -jar sigar.jar Top State.Name.eq=java If not, I'd guess that concurrent threads are accessing the same Sigar object. If that is the case, you should either synchronize access to the Sigar object or have one per thread. If that isn't the issue, is this new code using sigar or did you upgrade from an older version of sigar that caused this? |
From: dreedy <sig...@hy...> - 2009-04-21 07:02:29
|
Hi Doug, The catalyst that prompted this issue for me was getting a new Mac Pro. With this upgrade integration tests that were working all of a sudden stopped working. I then noticed that the test also failed sporadically on Ubuntu. After looking around it seemed to point to the SIGSEV happening in the SIGAR stack, and with the upgrade to 1.6.2 I thought this was the root cause. I tried several things, the last being your observation that concurrent threads are accessing the same Sigar object. I have changed my approach to use an access lock, ensuring that there would be no concurrent use of the Sigar object. I had been creating a single Sigar object with a factory approach, having all components that use Sigar share the same instance. My thought here was that Sigar would be accumulating load averages, etc ... and the one instance approach would be the right way to go. I now have a Sigar instance created for each request, with the lock to make sure that threads within each component do not cause concurrent access to the Sigar object. So far everything seems to be working well, thanks for your help (again) Dennis |
From: Doug M. <do...@hy...> - 2009-04-30 18:07:51
|
Hi Dennis, Glad you were able to fix the problem. The Sigar object does store some date used to calculate cpu%, disk service time, etc. Only the calls the Sigar.get* would need to be synchronized, the returned Objects are read-only copies of the data safe to access from multiple threads. At one point we had a SynchronizedSigar class that implemented the SigarProxy interface that would take care of this for you. Maybe we should bring that back to life? |