udt / Discussion / Open Discussion: Channel close() and queue shutdowns, possible race condition?

Channel close() and queue shutdowns, possible race condition?

Forum: Open Discussion

Creator: bkaskel

Created: 2013-08-22

Updated: 2016-05-27

bkaskel - 2013-08-22

At the end of CUDTUnited::removeSocket() in api.cpp, if a multiplexer's reference count goes to zero, the channel is closed, the queues and timer are deleted and finally the channel is deleted.

m->second.m_iRefCount --; if (0 == m->second.m_iRefCount) { m->second.m_pChannel->close(); // what if fd is reused before queue deletions? delete m->second.m_pSndQueue; delete m->second.m_pRcvQueue; delete m->second.m_pTimer; delete m->second.m_pChannel; m_mMultiplexer.erase(m); }

I have seen cases where datagrams are being "stolen" from newly created (UDP) sockets elsewhere in the process and I suspect it could be due to the ordering here.

By performing the close() before deleting the queues, doesn't this allow for the possibility that between the close() and the queue deletion, a new socket using the old file descriptor could be created in another thread and one or both of the queues could improperly use that new file descriptor? I did not see any synchronization which would prevent this problem. Would moving the channel close() to happen after the queues have been deleted introduce other problems?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Yunhong Gu - 2013-10-17

Sorry for the too-late response. I was not able to pay attention to this forum for a while.

This code block is protected by m_ControlLock (from checkBrokenSockets -> removeSockets). so a new socket should not be able to reuse this multiplexer.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Johannes Rudolph - 2016-05-27

We have also observed this problem which can lead to a complete deadlock of the whole UDT library if the stolen file descriptor is a blocking socket without a receive timeout set.

This is severe bug which needs to be fixed. Also filed here: https://github.com/barchart/barchart-udt/issues/83

Thank you, bkaskel, for reporting this issue here, I already spent a few days hunting this issue before I understood enough of which is happening here to find this discussion.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom zhou - 2016-05-27

appreciate this fix. I really struggle on it many days.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.