|
From: Tu T. N. <ttn...@ra...> - 2014-10-03 13:35:58
|
I have one site that gets this issue multiple times a day. Each time it happens the JVM # gets bumped up in the wrapper log, and theirs goes up into the triple digits. A second site has this issue about 2-3 times a week. Running without the wrapper, just from the command line, they have gone 2 weeks without a problem now. From: Mike Pilone [mailto:MP...@np...] Sent: Friday, October 03, 2014 5:15 AM To: wra...@li... Subject: Re: [Wrapper-user] JVM restarts "Wrapper Process has not received any CPU time for 69 seconds" I would say we have under 20 threads in the application and the vast majority should be idle. When we do have a service get killed this way it is almost always between 12AM and 3AM which is when we run a lot of nightly background jobs and batch processing. However the component that gets killed (usually 1 out of 12) varies and the load of that component also varies so we haven't been able to tie the issue to load on any one specific component at any particular time. We can sometimes go weeks without the issue and then have different processes killed 3 nights in a row. That's what makes debugging this so hard. At one point I was thinking it might be a paging issue where some process was swapped out and couldn't be swapped back in fast enough to answer the wrapper's ping but I don't have any hard evidence other than using swap on a VM guest is usually bad news. Hopefully with some more metrics I'll be able to find some correlations to other activities on the guest or host boxes. -mike NPR | Mike Pilone | Software Architect, Distribution | 1111 North Capital St., NE | Washington, DC 20002 <https://twitter.com/PRSS_NOC> <https://twitter.com/PRSS_NOC> On Oct 2, 2014, at 6:10 PM, Tu T. Nguyen <ttn...@ra...<mailto:ttn...@ra...>> wrote: Mike, Does your Java process have many threads and is very busy? Our industrial application creates many threads to go out and reads tags from Programmable Logic Controllers (PLC). I'm just collecting data points for how to possibly to and reproduce this issue in house. For us it only seems to happen on Windows 2003. I'm surprised to see that its happening to you on linux. The other data point I am collecting is vmware hardware versions for the guest machines. Regards, Tu Nguyen | FTPC Support Engineer | Office: 408.271.3464 | Mobile: 408.464.3252 | Rockwell Automation | Mission Viejo, CA (GMT -8) From: Mike Pilone [mailto:MP...@np...] Sent: Thursday, October 02, 2014 9:08 AM To: wra...@li...<mailto:wra...@li...> Subject: Re: [Wrapper-user] JVM restarts "Wrapper Process has not received any CPU time for 69 seconds" It doesn't look like a memory issue with the wrapper because we'll see it at different times on different nodes with different services even though we tend to start all of our services at the same time on a given node. The logs also indicate that the wrapper detected a problem and is attempting to restart the service (not just getting killed). I attached the logs from our most recent occurrence where we saw the same service get killed on two different guest VMs which are on two different physical hosts. The only thing the two physical nodes have in common is storage and network IO. -mike -- <image001.png> | Mike Pilone | Software Architect, Distribution | 1111 North Capital St., NE | Washington, DC 20002 | 202-513-2679 office | 703-969-7493 cell | mp...@np...<mailto:mp...@np...> <http://www.prss.org/> <image002.jpg> <image003.png> <http://www.nprss.org/> <image004.png> <https://www.facebook.com/pages/Public-Radio-Satellite-System-PRSS/225044460846999> <image005.png> <https://twitter.com/PRSS_NOC> <https://twitter.com/PRSS_NOC> On Oct 2, 2014, at 11:06 AM, Tim Lammens <tim...@gm...<mailto:tim...@gm...>> wrote: Mike, Are you sure the process is being killed by the wrapper and not by the linux kernel? A memory leak in glibc was causing our wrapper to consume a lot of memory (bug in appending to a file) but the linux oom killer decided to kill to process which was protected by the wrapper. Memory shortage than prevented the process being restarted by the wrapper. Regards, Tim On Thu, Oct 2, 2014 at 4:22 PM, Mike Pilone <MP...@np...<mailto:MP...@np...>> wrote: I can tell you that I'm using 3.5.24 and I see the problem on a Linux guest VM as I described earlier. So while I always recommend updating to the latest version, I wouldn't have much confidence that it is going to fix this specific issue. I'm wondering if the issue has something to do with and interaction between how the wrapper pings the JVM and VMWare guests. I think the ping is done by sending a packet over a socket from the wrapper process to the JVM and getting a simple packet reply. Maybe when the VM guest or host are under load (or something else) the ping packet gets buffered or something. So it might not be a CPU issue but some kind of IO or networking/socket issue. I'd like to have more time to investigate but when the issue only happens a couple of times a week at 2AM it is hard to get any good data. -mike -- <unknown.png> | Mike Pilone | Software Architect, Distribution | 1111 North Capital St., NE | Washington, DC 20002 <https://twitter.com/PRSS_NOC> <https://twitter.com/PRSS_NOC> On Oct 1, 2014, at 8:21 PM, Tu T. Nguyen <ttn...@ra...<mailto:ttn...@ra...>> wrote: What we are looking for is some justification that we can give our customer that upgrading even might resolve the issue they are seeing. Based on what you see in the logs, is there anything that would point to an upgrade being likely to resolve the problem? From: Leif Mortenson [mailto:lei...@ta...] Sent: Wednesday, October 01, 2014 6:33 AM To: Wrapper User List Cc: Elaine M. Julius Subject: Re: [Wrapper-user] JVM restarts "Wrapper Process has not received any CPU time for 69 seconds" Tu, Is the machine you are running on a physical or a virtual machine? We have seen cases where a loaded host causes the VM to freeze up when the guest itself does not show any load. 3.2.3 is also a VERY old version of the Wrapper. There have been a lot of improvements over the years in this area. Please try 3.5.25 and see how that works for you. You mentioned log files, but there was nothing attached to your mail. Cheers, Leif On Wed, Oct 1, 2014 at 10:23 PM, Tu T. Nguyen <ttn...@ra...<mailto:ttn...@ra...>> wrote: Hello, I have an issue where the wrapper restarts the JVM multiple times a day. We are using version 3.2.3 on Windows 2003 SP2. The duration always varies. We checked the CPU usage and its always very little when this happens . We have about 6 services that use the wrapper on this machine and they restart at different times. The suggestion is that if there was a CPU bottle neck they would all have this error at the same time. We also have other sites which use the same configuration but do not have this problem. We have collected logs with wrapper debugging enabled. Please have a look, any help would be much appreciated! Thank you, ------------------------------------------------------------------------------ Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk_______________________________________________ Wrapper-user mailing list Wra...@li...<mailto:Wra...@li...> https://lists.sourceforge.net/lists/listinfo/wrapper-user ------------------------------------------------------------------------------ Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk _______________________________________________ Wrapper-user mailing list Wra...@li...<mailto:Wra...@li...> https://lists.sourceforge.net/lists/listinfo/wrapper-user ------------------------------------------------------------------------------ Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk_______________________________________________ Wrapper-user mailing list Wra...@li...<mailto:Wra...@li...> https://lists.sourceforge.net/lists/listinfo/wrapper-user ------------------------------------------------------------------------------ Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk_______________________________________________ Wrapper-user mailing list Wra...@li...<mailto:Wra...@li...> https://lists.sourceforge.net/lists/listinfo/wrapper-user |