A TCP outduct does not reconnect after stopping and then restarting it with the bpadmin command. For example:
bpadmin
x outduct tcp <machine name="">:4556 stops the outduct, but
s outduct tcp <machine name="">:4556 does not restart it (e.g., executing a bping does not get a response)</machine></machine>
Anonymous
Bill,
* what operating system did this error occur on?
* Is this tcpcl or stcpcl?
Linux (either Ubuntu or Centos), and it is using TCPCL.
The problem is that most of the TCP-based convergence-layer adapters (tcpcli, brsscla, brsccla – but not stcp) use multiple threads within the same task to implement reception and transmission over the same socket – because separate tasks can’t do that. Restarting an outduct by restarting a CLO task is easy. Contriving to get a multi-threaded CLA task to restart a specific thread is a different proposition altogether; it requires some sort of control channel that none of these programs currently have. Developing that control channel will take some time.
We have a clunky temporary work-around (stop/start the TCP protocol in bpadmin), and this will not be easy, so slipping it to 3.7.1
An update: Bill says he has not seen the problem recur on any 3.7.0 machines. Nonetheless, some tweaks to the llcv library will go into ION 4.0.0 to further guard against the thread hangups that cause this reconnection failure.
Move to GitHub for tracking. Closing here.