Menu

#45 General Query - Process shutting down with pid/timeout

Any
open
rzo
None
4
2021-11-29
2021-11-17
Arushi Rai
No

Hi @rzo,
I am troubleshooting an escalation for my service where it is shutting down abruptly and here is what I see in the wrapper logs:

INFO|wrapper|21-08-03 01:55:42|stopping process with pid/timeout 1988 135000
INFO|wrapper|21-08-03 01:55:42|Shutting down Wrapper
INFO|1988/0|21-08-03 01:55:43|wrapper manager received Message:STOP::USER
INFO|wrapper|21-08-03 01:55:43|waitpid 1988 0
INFO|wrapper|21-08-03 01:55:43|exit code posix process: 0 application(status/signal): 0/0
INFO|wrapper|21-08-03 01:55:43|shutdown wrapper due to exit code rule
INFO|wrapper|21-08-03 01:55:43|killing 1988
INFO|wrapper|21-08-03 01:55:44|process exit code: 0

Other shutdown exit code that I see:

INFO|wrapper|21-08-03 02:01:30|waitpid 1618 15
INFO|wrapper|21-08-03 02:01:30|exit code posix process: 15 application(status/signal): 0/15
INFO|wrapper|21-08-03 02:01:30|killing 1618
INFO|wrapper|21-08-03 02:01:30|shutdown wrapper due to exit code rule
INFO|wrapper|21-08-03 02:01:31|process exit code: 0

I unfortunately do not have logs like journalctl logs from this instance to see if there was any service event executed and so I am trying to understand the wrapper logs more closely.

In what scenarios does wrapper says 'stopping process with pid/timeout 1988 135000'?
Here are the service timeouts that I have set:

wrapper.startup.timeout = 120
wrapper.shutdown.timeout = 120
wrapper.ping.timeout = 60

Is there a way to tell if this was user initiated or wrapper/service itself?

There is a monitoring service which when detects that my service is down would try to restart with command - systemctl restart myservice

I tested my service internally and ran same command systemctl restart myservice, I see:

INFO|wrapper|21-11-17 15:51:02|Shutting down Wrapper
INFO|wrapper|21-11-17 15:51:02|stopping process with pid/timeout 3606 135000
INFO|3606/0|21-11-17 15:51:02|wrapper manager received Message:STOP::WRAPPER SHUTDOWN
INFO|wrapper|21-11-17 15:51:02|waitpid 3606 0
INFO|wrapper|21-11-17 15:51:02|exit code posix process: 0 application(status/signal): 0/0
INFO|wrapper|21-11-17 15:51:02|killing 3606
INFO|wrapper|21-11-17 15:51:03|shutdown wrapper due to exit code rule
INFO|wrapper|21-11-17 15:51:03|process exit code: 0

what is the difference in -
wrapper manager received Message:STOP::USER
wrapper manager received Message:STOP::WRAPPER SHUTDOWN

Versions:
OS: Photon VM (Linux)
Java: OpenJDK Java 11
YAJSW: 12.15

Appreciate any feedback on this query!

Discussion

  • Arushi Rai

    Arushi Rai - 2021-11-17

    Please let me know if you would like me to share complete log file and any wrapper configuration.
    Thank you!

     
  • rzo

    rzo - 2021-11-18

    hello,

    STOP::USER means that the daemon was stopped by the user. for example by invoking systemctl stop

    STOP::WRAPPER means that the daemon is stopped by the wrapper. This could be for example because the application process crashed, or terminated or because a filter such as wrapper.filter.trigger has triggered or the application invoked WRAPPER_MANAGER.stop(), ....

    according to your log file the process received the signal 15 == SIGTERM:

    exit code posix process: 15 application(status/signal): 0/15

    Hope this helps. If you have more questions pls post.

    -- Ron

     
    • Arushi Rai

      Arushi Rai - 2021-11-18

      Thank you @rzo for your quick response. This definitely helps clarify the events.
      So I have been testing service stop and restart in my internal setup and for both
      systemctl stop myservice
      systemctl restart myservice

      I see in logs:

      INFO|wrapper|21-11-18 19:54:13|Shutting down Wrapper
      INFO|wrapper|21-11-18 19:54:13|stopping process with pid/timeout 5280 135000
      INFO|5280/0|21-11-18 19:54:13|wrapper manager received Message:STOP::WRAPPER SHUTDOWN
      INFO|wrapper|21-11-18 19:54:14|waitpid 5280 0
      INFO|wrapper|21-11-18 19:54:14|exit code posix process: 0 application(status/signal): 0/0
      INFO|wrapper|21-11-18 19:54:14|shutdown wrapper due to exit code rule
      INFO|wrapper|21-11-18 19:54:14|killing 5280
      INFO|wrapper|21-11-18 19:54:14|process exit code: 0
      

      I did encounter STOP::USER twice with restart but unable to produce that consistently.
      I would expect all events be STOP::USER.
      I do not have any wrapper.filter.trigger set for my service.

      Some more wrapper configuration:

      # Wrapper Posix Daemon Properties
      #********************************************************************
      # Directory in which to create and execute daemon init scripts. Default: /etc/init.d
      wrapper.daemon.dir =/etc/init.d
      
      # Directory where to store the wrapper pid file. Default: /var/run
      wrapper.daemon.pid.dir =${service_base}
      
      # Directory in which to create K... and S... links. 
      # Default: <wrapper.daemon.dir>/rcX.d
      # For Ubuntu set to /etc/rcX.d
      # The following grooy script should set it correctly for most distros
      wrapper.daemon.run_level_dir=${if (new File('\/etc\/rc0.d').exists()) return '\/etc\/rcX.d' else return '\/etc\/init.d\/rcX.d'}
      
      # Setting the runlevels and priorities for automatic startup and stop of the daemon.
      # Similar syntax as the update_rc.d command
      # Default: equivalent to default of update_rc.d
      wrapper.daemon.update_rc = start 002 3 4 5 . stop 002 0 1 2 6
      
      #********************************************************************
      # Exit Code Properties
      # Restart on non zero exit code
      #********************************************************************
      wrapper.on_exit.0=SHUTDOWN
      wrapper.on_exit.default=RESTART
      

      Should I be setting this property since now I am using 12.15:

      wrapper.daemon.system = systemd

       
  • rzo

    rzo - 2021-11-20

    hello,

    i tested this on centos and it seems that when stopped by systemctl it sometimes logs "USER" and sometimes "WRAPPER". Setting in the wrapper conf, wrapper.debug = true, reveals some more information. There seems to be a race condition for the stop reason within the wrapper. I will have to look into this. but this has low priority, since the functionality of stopping is not affected.

    Note that the wrapper has no way of "knowing" why/by whom it was stopped. If the application crashed the wrapper will notice this and either shut down or restart the application, depending on the configuration. When invoking systemctl stop, systemd sends a SIGTERM signal to the java application. In this case the wrapper process. This executes a shutdown hook which tries to gracefully stop the application and then terminates.

    Setting SuccessExitStatus=143 is, I think, not necessary. At least not on centos. the status is correctly set to inactive.

    you can also check /var/log/messages to see if the wrapper was stopped by systemd.

    -- Ron

     
  • Arushi Rai

    Arushi Rai - 2021-11-29

    Hi @rzo,
    I have set wrapper.debug=true in my internal setup and here is the log snippet for action
    systemctl stop myservice:

    INFO|wrapper|21-11-29 14:55:25|Shutting down Wrapper
    INFO|wrapper|21-11-29 14:55:25|stop from Thread Thread-2 reason: WRAPPER SHUTDOWN
    INFO|wrapper|21-11-29 14:55:25|set state RUNNING->STATE_USER_STOP
    INFO|wrapper|21-11-29 14:55:25|stopping process with pid/timeout 7735 135000
    INFO|wrapper|21-11-29 14:55:25|stop from Thread Thread-3 reason: USER
    INFO|7735/0|21-11-29 14:55:25|Controller State: LOGGED_ON -> USER_STOP
    INFO|wrapper|21-11-29 14:55:25|process not in state RUNNING -> Delaying stop
    INFO|7735/0|21-11-29 14:55:25|controller sending a stop command
    INFO|7735/0|21-11-29 14:55:26|Controller State: USER_STOP -> WAITING_CLOSED
    INFO|7735/0|21-11-29 14:55:26|session closed -> waiting
    INFO|7735/0|21-11-29 14:55:27|gobler terminated OUTPUT 7735
    INFO|7735/0|21-11-29 14:55:27|gobler terminated ERROR 7735
    INFO|wrapper|21-11-29 14:55:27|waitpid 7735 0
    INFO|wrapper|21-11-29 14:55:27|exit code posix process: 0 application(status/signal): 0/0
    INFO|7735/0|21-11-29 14:55:27|process terminated
    INFO|7735/0|21-11-29 14:55:27|Controller State: WAITING_CLOSED -> PROCESS_KILLED
    INFO|wrapper|21-11-29 14:55:27|giving up after 0 retries
    INFO|wrapper|21-11-29 14:55:27|set state STATE_USER_STOP->IDLE
    INFO|wrapper|21-11-29 14:55:27|shutdown wrapper due to exit code rule
    INFO|wrapper|21-11-29 14:55:27|stop from Thread yajsw.controller-1 reason: WRAPPER SHUTDOWN
    INFO|7735/0|21-11-29 14:55:27|Controller State: PROCESS_KILLED -> WAITING_CLOSED
    INFO|7735/0|21-11-29 14:55:27|session closed -> waiting
    INFO|wrapper|21-11-29 14:55:27|killing 7735
    INFO|wrapper|21-11-29 14:55:27|process exit code: 0
    INFO|wrapper|21-11-29 14:55:27|removed pid file /opt/myservice/wrapper.myservice.pid
    INFO|wrapper|21-11-29 14:55:27|set state IDLE->STATE_SHUTDOWN
    INFO|wrapper|21-11-29 14:55:27|set state IDLE->STATE_SHUTDOWN
    

    Let me know if this helps understand the behavior a bit more.
    I will be gathering the same from customer's env along with journalctl logs.
    Thank you for your assistance on this ticket.

     

Log in to post a comment.