|
From: Amresh D. <amr...@ya...> - 2007-10-09 18:36:35
|
Hi Lief,=0A =0AWe have managed to find out the cause of the UNKNOWN (127) s= ignals.=0A =0AThe system logs indicated that there were OutOfMemory errors = and as a result the kernel was killing the java process.=0A =0AThe error me= ssages in the /var/log/messages file were:=0A =0AOOM kill occurred on an x8= 6_64 numa system! The=0Anuma=3Doff boot option might help avoid this. =0A = =0AExtract from the red hat website has more details on this:=0A =0A"System= s with processors featuring AMD64 and Intel=AE EM64T are typically configur= ed as NUMA platforms, which means that the kernel constructs multiple memor= y nodes at boot-time rather than constructing a single memory node. The mul= tiple node construct can result in memory exhaustion on one or more of the = nodes before other nodes become exhausted. When memory exhaustion happens, = the following could result: =0A1) The system will swap the exhausted nodes = while there is available memory on other nodes, resulting in poor overal pe= rformance =0A2) Processes are killed due to Out-Of-Memory (OOM) errors even= though there is available memory=0A3) Less than optimal performance due to= excessive memory bandwidth when processes running on an exhausted node all= ocate memory on one or more different nodes"=0A =0ADetails here:=0A=0Ahttp:= //www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/release-notes/as-amd= 64/RELEASE-NOTES-U5-x86_64-en.html=0A=0A =0ATo fix the problem we simply us= ed the numa=3Doff boot option.=0A =0AWhat I found interesting was the SIGNA= L 127 which was reported as UNKNOWN by the wrapper. Is that something we ca= n investigate and add to the list of known signals?=0A =0AThanks for your h= elp and apologies for getting back to you on this.=0A =0ARegards,=0A =0AAmr= esh=0A =0A =0A=0A----- Original Message ----=0AFrom: Leif Mortenson <leif@t= anukisoftware.com>=0ATo: wra...@li...=0ASent: Friday,= September 28, 2007 8:51:26 AM=0ASubject: Re: [Wrapper-user] JVM exited in = response to signal UNKNOWN (127)=0A=0A=0AAmresh,=0AAs long as you use the w= rapper's shell script to control the wrapper=0Athe user will see no differe= nce in the way the shell script works.=0AThe shell script uses a pid file r= egardless, but yes, it does also start=0Ausing an anchor file to control th= e shutdown. That is how the script=0Acommunicates with the wrapper process= .=0A=0AThe Wrapper communicates with the JVM using a backend socket=0Aregar= dless of whether or not ignore signals is set.=0A=0ACheers,=0ALeif=0A=0AAmr= esh Deshmukh wrote:=0A> Thanks for your reply Leif.=0A>=0A> I will try the = wrapper.debug setting.=0A>=0A> We see the problem occuring more on one of o= ur servers. It is not reproducible (predictably) though.=0A>=0A> Will also = make sure that we have upgraded the sh script.=0A>=0A> With regards to IGNO= RE_SIGNALS I thought using that would mean we will have to start uising an = anchor file for stopping the JVM. Is that right?=0A>=0A> I will update you = with what I find.=0A>=0A> Regards,=0A>=0A> Amresh=0A>=0A>=0A>=0A>=0A>=0A>= =0A> ----- Original Message ----=0A> From: Leif Mortenson <leif@tanukisoftw= are.com>=0A> To: wra...@li...=0A> Sent: Friday, Septe= mber 28, 2007 7:58:52 AM=0A> Subject: Re: [Wrapper-user] JVM exited in resp= onse to signal UNKNOWN (127)=0A>=0A>=0A> Amresh,=0A> What platform is this = running on? I had a problem at a customer=0A> several years ago on Solari= s where the Wrapper would sometimes=0A> receive TERM signals from someplace= . The solution was to add a=0A> feature to ignore all system signals. Th= at works for all signals=0A> except for the SIGKILL. Which it appears you = are receiving.=0A>=0A> How easy is this for you to reproduce? If you set= =0A> wrapper.debug=3Dtrue then the Wrapper will add log data about=0A> whic= h process sent the signals. That might be useful to track=0A> down where t= he stray signals are coming from.=0A>=0A> In the case of my old customer, i= t was another user application=0A> which was using old PIDs to try clean up= its own process instances.=0A>=0A> To enable the ignore singals feature si= mple edit the wrapper's=0A> shell script uncomment the following line:=0A> = #IGNORE_SIGNALS=3Dtrue=0A>=0A> As you upgraded the Wrapper, make sure that = you are also=0A> upgrading the shell script.=0A>=0A> Let me know how this w= orks out.=0A> Cheers,=0A> Leif=0A>=0A>=0A> Amresh Deshmukh wrote:=0A> =0A= >> We have been using wrapper for last few years.=0A>>=0A>> We were using v= ersion 3.1.2 of the wrapper and have recently upgraded to the latest versio= n 3.2.3.=0A>>=0A>> The reason for upgrade was the error:=0A>>=0A>> "Critica= l error: wait for JVM process failed (No child processes)" =0A>>=0A>> Which= as I found was fixed in version 3.2.0.=0A>>=0A>> After the upgrade we have= seen occurence of the following error in our log files. =0A>>=0A>> STATUS = | wrapper | 2007/09/27 15:32:51 | JVM exited in response to signal UNKNOWN = (127).=0A>> ERROR | wrapper | 2007/09/27 15:32:51 | JVM exited unexpectedly= .=0A>> STATUS | wrapper | 2007/09/27 15:32:51 | JVM exited in response to s= ignal SIGKILL (9).=0A>> ERROR | wrapper | 2007/09/27 15:32:51 | Unable to s= tart a JVM=0A>> STATUS | wrapper | 2007/09/27 15:32:51 | <-- Wrapper Stoppe= d=0A>>=0A>> We have put in a fix today for the "Unable to start a JVM" usin= g the suggestion in one of the posts.=0A>>=0A>> wrapper.on_exit.default=3DR= ESTART=0A>> wrapper.on_exit.0=3DSHUTDOWN=0A>>=0A>> We have also increased t= he restart delay to 30 seconds.=0A>>=0A>> wrapper.restart.delay=3D30=0A>>= =0A>> The problem now is the fact that the JVM restart results in loss of c= ached data which impacts performance at the time of batch processing.=0A>>= =0A>>=0A>> We are running on 64 bit Linux OS with 32 bit jdk1.5.0_12 JVM (3= 2 bit limitation due a third party library)=0A>> OS details: =0A>> Red Hat = Enterprise Linux AS release 3 (Taroon Update 5)=0A>> Linux 2.4.21-32.0.1.EL= smp #1 SMP EDT x86_64=0A>>=0A>> It would be very helpful to know under what= circumstances we could get an error like this. We are in last week of UAT = due to go live next week any resolution/workaround for this would be highly= appreciated.=0A>>=0A>>=0A>> Amresh=0A>> =0A>> =0A>=0A>=0A> ---------= ----------------------------------------------------------------=0A> This S= F.net email is sponsored by: Microsoft=0A> Defy all challenges. Microsoft(R= ) Visual Studio 2005.=0A> http://clk.atdmt.com/MRT/go/vse0120000070mrt/dire= ct/01/=0A> _______________________________________________=0A> Wrapper-user= mailing list=0A> Wra...@li...=0A> https://lists.sour= ceforge.net/lists/listinfo/wrapper-user=0A>=0A>=0A> =0A> ___________= _________________________________________________________________________= =0A> Be a better Heartthrob. Get better relationship answers from someone w= ho knows. Yahoo! Answers - Check it out. =0A> http://answers.yahoo.com/dir/= ?link=3Dlist&sid=3D396545433=0A>=0A> --------------------------------------= -----------------------------------=0A> This SF.net email is sponsored by: = Microsoft=0A> Defy all challenges. Microsoft(R) Visual Studio 2005.=0A> htt= p://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/=0A> __________________= _____________________________=0A> Wrapper-user mailing list=0A> Wrapper-use= r...@li...=0A> https://lists.sourceforge.net/lists/listinfo/wr= apper-user=0A>=0A> =0A=0A=0A---------------------------------------------= ----------------------------=0AThis SF.net email is sponsored by: Microsoft= =0ADefy all challenges. Microsoft(R) Visual Studio 2005.=0Ahttp://clk.atdmt= .com/MRT/go/vse0120000070mrt/direct/01/=0A_________________________________= ______________=0AWrapper-user mailing lis...@li...= .net=0Ahttps://lists.sourceforge.net/lists/listinfo/wrapper-user=0A=0A=0A = =0A___________________________________________________________________= _________________=0APinpoint customers who are looking for what you sell. = =0Ahttp://searchmarketing.yahoo.com/ |