At the end of CUDTUnited::removeSocket() in api.cpp, if a multiplexer's reference count goes to zero, the channel is closed, the queues and timer are deleted and finally the channel is deleted.
m->second.m_iRefCount --;
if (0 == m->second.m_iRefCount)
{
m->second.m_pChannel->close(); // what if fd is reused before queue deletions?
delete m->second.m_pSndQueue;
delete m->second.m_pRcvQueue;
delete m->second.m_pTimer;
delete m->second.m_pChannel;
m_mMultiplexer.erase(m);
}
I have seen cases where datagrams are being "stolen" from newly created (UDP) sockets elsewhere in the process and I suspect it could be due to the ordering here.
By performing the close() before deleting the queues, doesn't this allow for the possibility that between the close() and the queue deletion, a new socket using the old file descriptor could be created in another thread and one or both of the queues could improperly use that new file descriptor? I did not see any synchronization which would prevent this problem. Would moving the channel close() to happen after the queues have been deleted introduce other problems?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry for the too-late response. I was not able to pay attention to this forum for a while.
This code block is protected by m_ControlLock (from checkBrokenSockets -> removeSockets). so a new socket should not be able to reuse this multiplexer.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We have also observed this problem which can lead to a complete deadlock of the whole UDT library if the stolen file descriptor is a blocking socket without a receive timeout set.
Thank you, bkaskel, for reporting this issue here, I already spent a few days hunting this issue before I understood enough of which is happening here to find this discussion.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
At the end of CUDTUnited::removeSocket() in api.cpp, if a multiplexer's reference count goes to zero, the channel is closed, the queues and timer are deleted and finally the channel is deleted.
I have seen cases where datagrams are being "stolen" from newly created (UDP) sockets elsewhere in the process and I suspect it could be due to the ordering here.
By performing the close() before deleting the queues, doesn't this allow for the possibility that between the close() and the queue deletion, a new socket using the old file descriptor could be created in another thread and one or both of the queues could improperly use that new file descriptor? I did not see any synchronization which would prevent this problem. Would moving the channel close() to happen after the queues have been deleted introduce other problems?
Sorry for the too-late response. I was not able to pay attention to this forum for a while.
This code block is protected by m_ControlLock (from checkBrokenSockets -> removeSockets). so a new socket should not be able to reuse this multiplexer.
We have also observed this problem which can lead to a complete deadlock of the whole UDT library if the stolen file descriptor is a blocking socket without a receive timeout set.
This is severe bug which needs to be fixed. Also filed here: https://github.com/barchart/barchart-udt/issues/83
Thank you, bkaskel, for reporting this issue here, I already spent a few days hunting this issue before I understood enough of which is happening here to find this discussion.
appreciate this fix. I really struggle on it many days.