socketReader should attempt to reconnect outgoing sockets
Brought to you by:
jackkane
Whenever the outbound socket is lost, socketReader does not try to reconnect. Idle sockets could be closed due to VPN tunnel timeout, TCP timeout, etc. socketReader should attempt to restart the outgoing connection.
In the console:
Hl7Transmitter_test_orders_Thread-6 will be stopped... Hl7Transmitter_test_orders_Thread-6: The socket has probably been closed already
In the log file (configured to INFO level):
2014-09-03 11:13:14,835 - SocketOps - ERROR - The socket we tried to send to is currently down. 2014-09-03 11:13:14,835 - SocketOps - ERROR - The socket we tried to send to is currently down. 2014-09-03 11:13:22,679 - SocketOps - ERROR - The socket we tried to receive from is currently down 2014-09-03 11:13:22,679 - SocketOps - ERROR - The socket we tried to receive from is currently down 2014-09-03 11:13:22,679 - SocketOps - ERROR - The socket we tried to receive from is currently down
this message is repeated a lot over ~ 5 minutes before ending with:
2014-09-03 11:13:22,693 - Hl7Transmitter_test_orders_Thread-6 - ERROR - socketReader encountered a socket exception: Traceback (most recent call last): File "/data/EDI/socketreader/Common/workerThreads.py", line 1322, in threadStop self.__clientSocket.fileno() # Just to check if the socket still exists. Throws an exception if not. File "/data/EDI/socketreader/Common/socketWrapper.py", line 77, in shutdown retval = self.__sock.shutdown(socket.SHUT_RDWR) File "/usr/lib/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) error: [Errno 107] Transport endpoint is not connected 2014-09-03 11:13:22,693 - Hl7Transmitter_test_orders_Thread-6 - PROD - First message still left in the queue: <hl7> 2014-09-03 11:13:22,693 - Hl7Transmitter_test_orders_Thread-6 - INFO - Thread stopped. Queue length: 2
socketReader had to be restarted in order to send the queued messages.
I set up a completely separate test environment. It looks like if the VPN goes down (100% packet loss) the sending process just keeps retrying every 10 seconds, each time it tries to send the message it goes into the outbound queue of messages (aka Send-Q). Once Send-Q is full, the operating system terminates the connection, which leads to the console message:
at DEBUG level (minute the sleeping 10 seconds message) the log has:
At this point no attempt is made to restart the sending thread.
I think the main
while running:
loop can be modified to check for dead threads and restart them. Since the thread is removed from the threadList when it stops, if the main body keeps track of the configuration -> threads that should be running, it can make a new thread to replace the thread that has disappeared. In the event of a VPN outage, the new thread would attempt to connect until the VPN is fixed, at which point it can start sending messages again.