Menu

Channel close() and queue shutdowns, possible race condition?

bkaskel
2013-08-22
2016-05-27
  • bkaskel

    bkaskel - 2013-08-22

    At the end of CUDTUnited::removeSocket() in api.cpp, if a multiplexer's reference count goes to zero, the channel is closed, the queues and timer are deleted and finally the channel is deleted.

       m->second.m_iRefCount --;
       if (0 == m->second.m_iRefCount)
       {
          m->second.m_pChannel->close(); // what if fd is reused before queue deletions?
          delete m->second.m_pSndQueue;
          delete m->second.m_pRcvQueue;
          delete m->second.m_pTimer;
          delete m->second.m_pChannel;
          m_mMultiplexer.erase(m);
       }
    

    I have seen cases where datagrams are being "stolen" from newly created (UDP) sockets elsewhere in the process and I suspect it could be due to the ordering here.

    By performing the close() before deleting the queues, doesn't this allow for the possibility that between the close() and the queue deletion, a new socket using the old file descriptor could be created in another thread and one or both of the queues could improperly use that new file descriptor? I did not see any synchronization which would prevent this problem. Would moving the channel close() to happen after the queues have been deleted introduce other problems?

     
  • Yunhong Gu

    Yunhong Gu - 2013-10-17

    Sorry for the too-late response. I was not able to pay attention to this forum for a while.

    This code block is protected by m_ControlLock (from checkBrokenSockets -> removeSockets). so a new socket should not be able to reuse this multiplexer.

     
  • Johannes Rudolph

    We have also observed this problem which can lead to a complete deadlock of the whole UDT library if the stolen file descriptor is a blocking socket without a receive timeout set.

    This is severe bug which needs to be fixed. Also filed here: https://github.com/barchart/barchart-udt/issues/83

    Thank you, bkaskel, for reporting this issue here, I already spent a few days hunting this issue before I understood enough of which is happening here to find this discussion.

     
  • tom zhou

    tom zhou - 2016-05-27

    appreciate this fix. I really struggle on it many days.

     

Log in to post a comment.