Menu

#21 udt can hang

open
nobody
None
5
2015-03-04
2011-05-04
Aaron Reed
No

In our project I've seen situations where background threads can hang due to udt. I'll notice multiple threads hanging inside CGuard::CGuard trying to get the m_ControlLock which is the same for all of them. Looking further, I'll see that the m_ControlLock is held inside the garbage collector thread while the CSndQueue::~CSndQueue is running. However, CSndQueue::~CSndQueue is stuck waiting for its worker thread to end. The worker thread isn't ending because it is stuck doing self->m_pTimer->sleepto(ts) where the ts is sometimes 2000 hours beyond currtime. I've even seen it further out than that.

I don't have much experience at all inside UDT so I have no idea how to fix this. I don't see how anything we are doing with UDT would cause this huge sleep delay; that all seems to be calculated inside UDT unless I'm mistaken.

Any ideas would be appreciated.

Discussion

  • Aaron Reed

    Aaron Reed - 2011-05-05

    I looked at this some more. The large value is coming from the m_llInterval member of a CUDT object. This member is set by a large value in a CUDTCC::m_dPktSndPeriod member. I had put in a break point if that value is ever big enough to be > 30mins and it broke during CUDTCC::onLoss when it bumps the current m_dPktSndPeriod by ceil(m_dPktSndPeriod * 1.125). I wonder if it is getting bumped alot. The problem occurs on kinda slow machines that are pegged doing a lot of video rendering.

     
  • Yunhong Gu

    Yunhong Gu - 2011-05-06

    thanks, I found a bug that can cause this problem. The cause is not the big m_dPktSndPeriod value, which may happen if there are lots of packet losses. The real problem is that when the connection is closed (and hence the queue is removed), the timer sleepto() call is supposed to be interrupted and return immediately. The bug happens here and it failed to interrupt the timer.

    The CVS has not been updated yet. I will do so in the next few days together with another update.

     
  • Vitaliy Yunikov

    Vitaliy Yunikov - 2012-04-23

    Hello,
    Has the issue been fixed? If yes, do you know in which UDT version?

    I think I am facing the same issue with udt 4.10.
    I have client and server applications which both use UDT. Server sends data and client receives it. When server goes down, UDT::recv returns an error after which my application closes corresponding UDT socket. After that client opens a new socket and downloads data from another server.
    So, after client starts to download data from another server, UDT:recv often hangs.

    My stack looks as follows:
    WSARecvFrom
    udt.dll!CChannel::recvfrom(sockaddr * addr=0x0215db80, CPacket & packet={...}) Line 317 + 0x2c bytes C++
    udt.dll!CRcvQueue::worker(void * param=0x042b1b68) Line 1014 + 0x13 bytes C++

    WSARecvFrom
    udt.dll!CChannel::recvfrom(sockaddr * addr=0x0215d780, CPacket & packet={...}) Line 317 + 0x2c bytes C++
    udt.dll!CRcvQueue::worker(void * param=0x02163b18) Line 1014 + 0x13 bytes C++

    WaitForSingleObject
    udt.dll!CRcvQueue::~CRcvQueue() Line 914 + 0x11 bytes C++
    udt.dll!CRcvQueue::`scalar deleting destructor'() + 0x2b bytes C++
    udt.dll!CUDTUnited::removeSocket(const int u=296592219) Line 1307 + 0x33 bytes C++
    udt.dll!CUDTUnited::checkBrokenSockets() Line 1248 + 0x16 bytes C++
    udt.dll!CUDTUnited::garbageCollect(void * p=0x02187318) Line 1478 C++

    WaitForSingleObject
    udt.dll!CGuard::CGuard(void * & lock=0x000001b0) Line 312 + 0x12 bytes C++
    udt.dll!CUDTUnited::lookup(const int u=296592218) Line 468 + 0xf bytes C++
    udt.dll!CUDT::recv(int u=296592218, char * buf=0x0459ec5c, int len=4096, int __formal=0) Line 1817 + 0xe bytes C++
    udt.dll!UDT::recv(int u=296592218, char * buf=0x0459ec5c, int len=4096, int flags=0) Line 2230 + 0x15 bytes C++

    One more thing that could be important is that client is python script which uses C++ DLL which implements download using UDT. Server is C++ application.

    Thanks in advance.

     
  • Anonymous

    Anonymous - 2012-12-31

    Hi ,

    Have this bug been fixed already ? any Ideas how to fix it ?

    I thing I am dealing the same bug
    I have a rare situations when I try to create a new udt socket after another socket was closed . CUDTUnited::newSocket is stuck at the m_ControlLock lock that is apparently held by the garbage collector who trys release the socket that was closed (CSndQueue::~CSndQueue of the closed socket is running also)

     

    Last edit: Anonymous 2014-04-01
  • Anonymous

    Anonymous - 2015-03-04

    Hi,

    What is the status of this bug?
    Is it fixed on latest?

     

Log in to post a comment.