#2 Sockets are not closed in a timely fashion

open-later
Conrad Braam
None
5
2009-11-05
2009-11-04
Sensei TG
No

When a master requests the connection to be closed (lingering is turned off) the modbus simulator does not close the socket.

I am unsure if you are familiar with TCP internals, but to request a close a FIN+ACK packet is sent placing the socket in a half-open state. Normally when this state is detected you would want to close the socket from the other end as well (or in your case linger for a while if lingering is turned on). However after closing the socket from my end your application waits for 100 seconds before closing even when lingering is turned off. In any case, a lingering time of 100 seconds should be considered *extreme* and it is not exactly "normal" tcp behaviour but perhaps there is a reason (emulation wise) for this?

Since our master can only hold one connection active at a time it also takes 100 seconds before it can connect to the next slave which led me to implement a lingering timeout in our network stack. If the remote does not close within a specified time interval the connection is reset (by sending an RST+ACK packet). This will forcefully invalidate the socket, which leads me to the next problem - your application does not detect this condition either. Checking the netstat listing clearly indicates that Windows has completely dropped the connection and that the socket is no longer active, yet your application still lists an active connection.

It seems it is impossible to communicate more than 10 sessions (number of sockets) in 100seconds (1 session/10 seconds) unless you keep the connections always open towards the modbus simulator.

Discussion

  • Conrad Braam
    Conrad Braam
    2009-11-04

    This is/was by design. A batching controller implementation I encountered behaved this way, so I copied it.
    Masters which close the connection and re-open a new one all the time will stop polling or notice huge lag in a WAN environment. Myself and a many other folk have found this behavior helps flag cases when you would be un-necessarily loading a low power embedded slave. Slaves typically only have 8 or 16 listens maximum.

    If the open-mbus specification doc does have a say on this issue I could look at adding a setting to allow new behavior since it does make sense to. I need a volunteer though.

     
  • Conrad Braam
    Conrad Braam
    2009-11-04

    • assigned_to: nobody --> zaphodikus
    • status: open --> open-wont-fix
     
  • Sensei TG
    Sensei TG
    2009-11-05

    I trust what you say, and if you feel this is the correct behaviour I'm not going to try to change your mind but there are a few things i'd like to add for your consideration :)

    I work for a PLC manufacturer where we have more than 10 years of experience working with modbus. Of the hundreds of different devices I've used I've not once seen this kind of behaviour. Lingering, yes sometimes - but not for more than a few seconds at most. "Lingering" times of <1ms are the "normal" case. Connections are also freed immediately upon reception of an RST for those slaves that do linger - which is also the behaviour dictated by the TCP specification.

    While your statement is in it's nature true - closing and re-opening the connection is an unneccessary load - there are other issues to consider:

    One is that many masters (ours for example) are running slow processors with limited memory (in our case 50Mhz and only a few kB that I can allocate for modbus funcitonality).

    Still, one of these devices will in many cases want to communicate with a large number of modbus/tcp slaves. Keeping one socket permanently open for each of these slaves would require a large quantity of memory and would load the cpu significantly (for each incoming packets, all open sockets must be searched to find the one that the packet is intended for).

    As you said, slaves typically only have a small number of listens, reason being that they are running on limited hardware. A master that is running on similar limited hardware has the same restrictions on the number of concurrent connections.

    With this logic you'll realize that it would be impossible for such a master to communicate (with any speed) with say 30 different slave devices, if the slave devices implement the behaviour seen in your program, because the master would *have to* close connections when it runs out of resources. I hope I am making sense :)

    Also, you'll not be able to find any information regarding this in the specification because this behaviour relates to TCP and not modbus itself. TCP is an underlaying layer and is regulated by it's own specification. In this specification you'll find information regarding lingering, why it is needed and how it should be used. A socket should never linger unless more data is expected to be sent. Allthough this is a feature rarely used since most modern network stack implementations (windows for example) does not allow data to be received on a half-open connection.

     
  • Conrad Braam
    Conrad Braam
    2009-11-05

    Sorry, I now remember why I'm not doing graceful close. I'm going to move
    it into my File#12 for now to make it an optional setting.
    It was by design, in order to highlight the problems I found on WANs when
    the PHY layer does not give an ACK to the FIN, the master will end up
    waiting forever. This happens when the Slave has been physically removed
    from the network by unplugging it from the far side of a hub. The long IP
    timeout was a huge penalty for the master to block on, so the simulator
    emulates that behavior, and my Master was modified to not use graceful
    closure but rather allow for the case where the slave has been power-cycled
    (in which case it recovers quickly by sending an ACK out of sequence anyway
    after it sends ARPs). You may want to check whether you want to do the same kind of thing in your Master.

    Humblest apologies, this was all done about 6 years ago, and I will hence
    put it into the FAQ too.

     
  • Conrad Braam
    Conrad Braam
    2009-11-05

    • status: open-wont-fix --> open-later
     
  • Sensei TG
    Sensei TG
    2009-11-05

    Ah, that makes perfect sense! :)

    Actually I'd call that an excellent testing scenario - one that is commonly forgotten.

    Indeed the TCP specification does allow for a socket to wait forever in the case where the connection is lost after an ACK is received to the initial FIN (otherwise the FIN would be retransmitted until the socket times out). When the ACK is received though - a TCP implementor may wait for infinity to receive a FIN from the remote end. I've solved this in the TCP layer by implementing keep-alive's which will eventually kill the connection. For situations where closing speed is even more important, or where lingering is very common, i've implemented an additional mechanism that allows the application layer to set a close socket timeout, and if a FIN is not received within the specified time an RST is sent out to force a reset for the connection.

    I am guessing you are sending an out-of-sequence ACK in order to trigger a returning RST packet that would indicate that the connection was lost due to power cycling, or did I get that wrong? Might be noteworthy that not all network stacks send out RST's to non-existant connections. Lately there has been a security discussion on whether to allow RST through a router from LAN->WAN and it is not uncommon for routers to block these packets.