|
From: Georg S. <geo...@ti...> - 2003-08-14 17:10:29
|
Leif, thanks for the detailed explanation. I have understood all the points you made. I cannot agree with you on the first point. The practical effect is, that, if the process runs into an error condition, which does not go away through restarting, the crashing-and-restarting process will loop forever (which has to be avoided under all circumstances. It may take several hours, until the service people react and fix the situation. During this time the endless loop may do a lot of harm by opening more and more files and database connections without closing them, which may exhaust the available file handles as well as the maximum number of database connections, all in all creating a lot more damage than simply staying down after 5 retries). Moreover, according to the documentation, the retry count will be reset, if the process has been running successfully for more than a specified time. For the second issue there has to be made a difference between the reaction of the process controlled by the wrapper and the wrapper itself. Even if the wrapper restarts a process that has received a TERM, the wrapper itself can still react to this signal by shutting down. But this is not the point. The wrapper should NOT go down when receiving a TERM, it should stay. Otherwise it would be vulnerable itself to a forgotten "nohup" or, taking my situation, TERM signals with unknown sources. Instead it should be possible to shutdown the wrapper by, for instance, creating a "well-known" file. Many of the processes in our manufacturing environment catch all signals they can, because they must not be interrupted in their activities during certain periods of time and are shut down in this way. > Processes should always respond to a kill TERM by exiting gracefully. Not always. For instance, if the foreground process of a Bourne shell gets a Ctrl-C, the shell sends a TERM to all processes it has started in the background. The background processes should, of course, ignore this TERM in this situation. (I only know this, because there is a related JDK1.4 bug). A tool like the wrapper is only required in "hostile" environments. If everybody behaves as expected, one does not need a watchdog. You are right, the wrapper behaves as defined, but that is the very problem. Best regards Georg Leif Mortenson wrote: > Georg Schmid wrote: > >> a few hours ago I downloaded the Wrapper 3.0.4 in order to improve >> the uptime of my JBoss servers. Everything works fine so far, but now >> I am not sure anymore, whether Wrapper can solve my problems. >> >> 1) Restarting after OutOfMemory exceptions works after defining the >> appropriate filter. I also want the wrapper to restart the server at >> most 5 times. When I tested this by specifying 8 MB max. heap, the >> server was restarted continuously (had to kill it after jvm 7 came up). > > > The wrapper.max_failed_invocations property only controls the number > of restarts due to a failed startup. If the JVM is restarted due to a > call to WrapperManager.restart() or due to a filter, as in your case, > then there is no limit to the number of times that the JVM will be > restarted. If you think about it, this is good behavior. If your > application runs out of memory once per day, you would want it to > always be restarted. Not restart for 5 days and then give up and quit. > > In this case, the Wrapper is working as expected. In general, if your > application is running out of memory, that is a bug in your > application. The Wrapper makes it possible to work around such > problems temporarily, but they really should be fixed at some point. > >> 2) The second problem is, that the (locally running, nohup-started) >> servers go down, when the network goes down. They seem to get a >> "normal" TERM, just like pressing Ctrl-C or issuing a kill (without >> -9). JBoss shuts down cleanly, working its shutdown hook, but Wrapper >> does not restart the process, as I expected. I introduced a filter, >> that checks for "Shutting down the JVM" string and then triggers a >> RESTART. The filter matches, but no restart occurs, instead the >> wrapper stops. > > > If your OS is sending a TERM signal to the JVM and Wrapper, then it is > correct that the Wrapper is quitting. Catching the message using a > filter and attempting to restart will not work because the Wrapper is > already trying to shut itself down. Processes should always respond > to a kill TERM by exiting gracefully. > > Please explain if I am missing something. But it seems like the > Wrapper is behaving correctly here. If it were to respond to a TERM > signal by restarting the JVM then there would be no way to kill the > Wrapper other than a kill -9, which would of course be bad. > >> 3) I could not find a parameter to control the ping frequency (not >> the timeout). The ping slows down the application, and for a server, >> that should be running for half a year a ping every 30 seconds or >> similar would be good enough. > > > There is already a feature request to be able to control the ping > time. But how are you seeing that the ping is slowing down your > application. Theoretically that is true as there is additional work > being done. But it is very insignificant. The longer the ping > interval, the less responsive the Wrapper will be to a hung JVM. > Pinging once every 5 seconds with debug output off has never caused > any noticeable performance issues in my experience. > > Cheers, > Leif > > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Free pre-built ASP.NET sites including > Data Reports, E-commerce, Portals, and Forums are available now. > Download today and enter to win an XBOX or Visual Studio .NET. > http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 > > _______________________________________________ > Wrapper-user mailing list > Wra...@li... > https://lists.sourceforge.net/lists/listinfo/wrapper-user -- -- Georg Schmid Special Applications Section Manager mailto:geo...@ti... Freising Wafer Fab (FFAB) Make IT phone: +49 8161 804595 Texas Instruments Deutschland GmbH fax: +49 8161 803350 |