Menu

YAJSW 10.08 strange behavior: infinite reconnect loop caused by wrong WRAPPER_MSG_KEY received.

bob
2017-05-03
2017-05-03
  • bob

    bob - 2017-05-03

    Related Topic: https://sourceforge.net/p/yajsw/discussion/810311/thread/ff5805cc/

    Problem Description:
    In Normal case, when application starting, the org.rzo.yajsw.app.WrapperManagerImpl#reconnect will make connection to Controller and try to authorize the connection by wrapper.key.
    However, in my application, what i found is the Controller regularly received some wrong message(a 'null' string message);
    And the org.rzo.yajsw.controller.jvm.ControllerHandler#channelRead method treat it as a org.rzo.yajsw.Constants#WRAPPER_MSG_KEY message;

    because the string 'null' is serailzied as bytes: [110, 117, 108, 108]; and the MessageDecoder will parse it as a
    WRAPPER_MSG_KEY(110) packet(with message = 'ull');

    So, the Controller treat it as wrong key and try to close netty connection; however, the client-side handler (org.rzo.yajsw.app.WrapperManagerImpl.WrapperHandler) will try to reconnect if channel is inactive. so a infinite loop is created.(this explain the phenomenon of Related Topic: https://sourceforge.net/p/yajsw/discussion/810311/thread/ff5805cc/)

    the demo code to show a 'null' packet decoded as org.rzo.yajsw.Constants#WRAPPER_MSG_KEY message:

        public static void main(String[] args) {
            byte[] content = "null".getBytes();
            System.out.println(Arrays.toString(content));
    
            // org.rzo.yajsw.controller.jvm.MessageEncoder#encode
            ByteBuf buffer = Unpooled.buffer(content.length + 1);
            buffer.writeByte(content[0]);
            buffer.writeBytes(content, 1, content.length - 1);
            buffer.writeByte((byte) 0);
    
           // org.rzo.yajsw.controller.jvm.MessageDecoder#decode
            byte code = buffer.readByte();
            buffer.writerIndex(buffer.writerIndex());
            String msg = buffer.toString(Charset.defaultCharset());
    
            // code == org.rzo.yajsw.Constants#WRAPPER_MSG_KEY
            System.out.println(code + " = " + msg);
        }
    

    What i doing now is skip the problem by simply drop the null message(and add more context logging to track the root cause):

       //  org.rzo.yajsw.controller.jvm.ControllerHandler#channelRead
    
        @Override
        public void channelRead(ChannelHandlerContext ctx, Object in)
                throws Exception
        {
            if (_controller.getState() == JVMController.STATE_USER_STOP)
            {
                // set the channel if not set
                _controller._channel.set(ctx.channel());
                _controller.stop(JVMController.STATE_USER_STOP, "INTERNAL");
                return;
            }
            Message message = (Message) in;
            switch (message.getCode())
            {
            case WRAPPER_MSG_KEY:
                // check if JVM sent us correct key
                if (_controller._key.equals(message.getMessage()))
                {
                    // we set the channel not in channelConnected,
                    _controller._channel.set(ctx.channel());
                    _controller.setState(JVMController.STATE_LOGGED_ON);
                    _controller.startupOK();
                    ctx.channel().writeAndFlush(
                            new Message(Constants.WRAPPER_MSG_OKKEY, ""
                                    + _controller._wrappedProcess.getAppPid()));
                    if (_controller.getDebug() > 2)
                        _controller.getLog().info("Correct key[" + ctx.channel() + "]");
                }
                // if not: announce it and close session
                else
                {
                    if ("ull".equals(message.getMessage())) {
                        if (_controller.getDebug() > 2) {
                            _controller.getLog().info("Problem Key[" + ctx.channel() + "] -> " + message.getMessage() + ", expected:" + _controller._key);
                        }
                    }
                    else {
                        if (_controller.getDebug() > 0) {
                            _controller.getLog().info("Wrong Key[" + ctx.channel() + "]");
                        }
                        else if (_controller.getDebug() > 2) {
                            _controller.getLog().info("Wrong Key[" + ctx.channel() + "] -> " + message.getMessage() + ", expected:" + _controller._key);
                        }
                        ctx.channel().writeAndFlush(new Message(Constants.WRAPPER_MSG_BADKEY, null));
                        ctx.channel().close();
                    }
                }
                break;
                ...
         }
    }
    

    All in all, the problem is caused by wrong message sent by client(i am sure about this, both the normal WRAPPER_MSG_KEY message and 'null' message is sending by same connection, see the logging output below):

    INFO|wrapper|17-05-03 14:29:44|posix_spawn pid 23211
    INFO|wrapper|17-05-03 14:29:44|started process 23211
    INFO|wrapper|17-05-03 14:29:44|waiting for termination of process
    INFO|wrapper|17-05-03 14:29:44|created jva.pid file /home/xx/hidas-agent-1.2-SNAPSHOT/wrapper/agent.bootstrap.pid
    INFO|wrapper|17-05-03 14:29:44|started process with pid 23211
    INFO|wrapper|17-05-03 14:29:44|set state STARTING->RUNNING
    INFO|23211/0|17-05-03 14:29:54|[INFO] StandardFileSystemManager - Using "/home/xx/hidas-agent-1.2-SNAPSHOT/wrapper/tmp/vfs_cache" as temporary files store.
    INFO|23211/0|17-05-03 14:29:54|Unable to classify GarbageCollectorMXBean [G1 Young Generation]
    INFO|23211/0|17-05-03 14:29:54|Unable to classify GarbageCollectorMXBean [G1 Old Generation]
    INFO|23211/0|17-05-03 14:29:54|APP user name=root
    INFO|23211/0|17-05-03 14:29:54|APP working dir=/home/xx/hidas-agent-1.2-SNAPSHOT
    INFO|23211/0|17-05-03 14:29:54|APP java version=1.8.0_66
    INFO|23211/0|17-05-03 14:29:54|APP class path=...
    INFO|23211/0|17-05-03 14:29:54|APP library path=/home/xx/hidas-agent-1.2-SNAPSHOT/lib
    INFO|23211/0|17-05-03 14:29:54|Application args: no args
    INFO|23211/0|17-05-03 14:29:54|mainClass/jar/script: com.xx.agent.SimpleAgentBootstrap/null/null
    INFO|23211/0|17-05-03 14:29:54|external stop false
    INFO|23211/0|17-05-03 14:29:54|args:
    INFO|23211/0|17-05-03 14:29:54|monitor deadlock: start
    INFO|23211/0|17-05-03 14:29:54|monitor heap: start
    INFO|23211/0|17-05-03 14:29:54|terminated WrapperManager.init()
    INFO|23211/0|17-05-03 14:29:54|connecting to port 15003
    INFO|23211/0|17-05-03 14:29:55|WrapperManager: channel connected, sending key
    INFO|23211/0|17-05-03 14:29:55|Controller State: WAITING -> ESTABLISHED
    INFO|23211/0|17-05-03 14:29:55|Controller State: ESTABLISHED -> LOGGED_ON
    INFO|23211/0|17-05-03 14:29:55|Correct key[[id: 0x53461aab, L:/127.0.0.1:15003 - R:/127.0.0.1:33251]]
    INFO|23211/0|17-05-03 14:29:56|exit on main terminate -1
    
    INFO|23211/0|17-05-03 14:30:00|Problem Key[[id: 0x53461aab, L:/127.0.0.1:15003 - R:/127.0.0.1:33251]] -> ull, expected:-7847590594701025371
    INFO|23211/0|17-05-03 14:30:05|Problem Key[[id: 0x53461aab, L:/127.0.0.1:15003 - R:/127.0.0.1:33251]] -> ull, expected:-7847590594701025371
    INFO|23211/0|17-05-03 14:30:10|Problem Key[[id: 0x53461aab, L:/127.0.0.1:15003 - R:/127.0.0.1:33251]] -> ull, expected:-7847590594701025371
    
    ... repeated printing(per 5000ms) ...
    

    Finally, becuase the controller's connection always in reconnect state, the other command(such as stop/restart) just hang there without any response(https://sourceforge.net/p/yajsw/discussion/810311/thread/ff5805cc/).

     

    Last edit: bob 2017-05-03
  • rzo

    rzo - 2017-05-07

    hello,

    thanks for taking your time to check this out.

    However, I am wondering where the literal string "null" is being sent from. Analyzing the code this should never happen. The cycle of 5 sec. indicates that this could be the PING message. But, this message is only 2 bytes long: [0x67,0x0].

    Anyway I have adpated the code to ignore further KEY messages once a correct key has been received.

    Doing this I have noticed that the log setting for the communication between the wrapper and the app was wrong.

    If you want I could provide you with a download for the update, for testing.

    BTW: are you using the original yajsw code or did you change anything ?

    -- Ron

     
  • bob

    bob - 2017-05-23

    Hi rzo, sorry for responding so late.

    i am using source code packaged in the yajsw zip.
    i just change the class file i need then replace the class in wrapper.jar with the new compiled one.

    About why the "null" string is sent/received, i think it probably because there's bug in client, message decoder, or undering netty component, which read invalid message as "null"

     

    Last edit: bob 2017-05-23

Log in to post a comment.