#19 Race condition in genaNotifyThread

closed-fixed
None
6
2010-09-28
2008-10-15
No

When the HandleLock was moved to a Read/Write lock, the lock was dropped inside of genaNotifyThread while calling genaNotify.

genaNotify touches the network and it is preferred that no lock is held while it is running.

However, in some cases, the nature of read/write locks causes the write lock taken after genaNotify is called to be starved while genaNotifyThread is called over and over.

What I see is that an event is sent, and that thread waits on the write lock. While waiting another event is dispatched on another thread, it hits the "in->eventKey != sub->ToSendEventKey" case and reschedules itself.

The case "in->eventKey != sub->ToSendEventKey" will not be cleared until the write lock can be taken by the first thread. Until that happens, the second thread is free to call genaNotifyThread again with the next event. The next event has in->eventKey + 1 so also hits "in->eventKey != sub->ToSendEventKey"

This race continues until all threads are scheduled such that nobody is sitting in the read lock on thread schedule. The write lock thread is then dispatched as it has been starved waiting and clears the condition.

This race can be triggered by eventing many variables sequentially.

It appears the easy fix for this would be to take the write lock for the entire genaNotifyThread

A more involved fix would be to not try to run the next event job until ToSendEventKey is incremented.

-Alexander Faucher

Discussion

  • Alexander Faucher

    It appears just dropping the strict ordering of "in->eventKey != sub->ToSendEventKey" clears the problem. This might break the upnp specification however. That check isn't justified in the code.

     
  • Marcelo Roberto Jimenez

    Hi Alexander,

    Do you have a patch? That makes things simpler to analyze.

    Regards,
    Marcelo.

     
  • Marcelo Roberto Jimenez

    • priority: 5 --> 6
    • assigned_to: nobody --> mroberto
     
  • Alexander Faucher

    Sorry, I don't, I went with the fast and easy solution. The proper fix is to make a queue of events for each subscription and create a job to service each event in order for that subscriber.

     
  • Marcelo Roberto Jimenez

    Hi Alexander,

    I know I am asking a lot, but would it be possible for you to post here a small program to reproduce the bug and explain the setup? If I could reproduce the problem here I could try to deal with this issue.

    Regards,
    Marcelo.

     
  • Marcelo Roberto Jimenez

    • status: open --> closed-fixed
     
  • Marcelo Roberto Jimenez

    A bug fix has been committed by Fabrice Fontaine, please check.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks