[pupnp-tracker] [ pupnp-Bugs-3158591 ] Race condition can hang miniserver thread

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Bugs item #3158591, was opened at 2011-01-14 15:41
Message generated for change (Comment added) made by inactiveneurons
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=841026&aid=3158591&group_id=166957

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: threadutil
Group: None
Status: Open
Resolution: None
Priority: 9
Private: No
Submitted By: Chuck Thomason (cyt4)
Assigned to: Marcelo Roberto Jimenez (mroberto)
Summary: Race condition can hang miniserver thread

Initial Comment:
Hello,

I have found a race condition in the thread pool handling of libupnp-1.6.6 that periodically results in the miniserver thread getting blocked infinitely.

In my setup, I have the miniserver thread pool configured with 1 job per thread, 2 threads minimum, and 50 threads maximum.

Just before the lockup occurs, the miniserver thread pool contains 2 threads: one worker thread hanging around from a previous HTTP request job (let's call that thread "old_worker") and the miniserver thread itself.

A new HTTP request comes in.  Accordingly, the miniserver enters schedule_request_job() and then ThreadPoolAdd().  In ThreadPoolAdd(), the job gets added to the medium-priority queue, and AddWorker() is called.  In AddWorker(), jobs = 1 and threads = 1, so CreateWorker gets called.

When we enter CreateWorker(), tp->totalThreads is 2, so currentThreads is 3.  The function creates a new thread and then blocks on tp->start_and_shutdown.  The miniserver thread expects the newly created thread to increment tp->totalThreads and then signal the condition variable to wake up the miniserver thread and let it proceed.

The newly created thread starts in the WorkerThread() function.  It increments tp->totalThreads to 3, does a broadcast on the start_and_shutdown condition, and starts running its job.  However, before the miniserver thread wakes up, "old_worker" times out.  It sees that there are no jobs in any queue and that the total number of threads (3) is more than the minimum (2).  As a result, it reduces tp->totalThreads to 2 and dies.

Now the miniserver thread finally wakes up.  It checks tp->totalThreads and sees that its value is 2, so it blocks on tp->start_and_shutdown again.  It has now "missed" seeing tp->totalThreads get incremented to 3 and will never be unblocked again.

When this issue does occur for a server device, the miniserver port remains open, but becomes unresponsive since the miniserver thread is stuck.  SSDP alive messages keep getting sent out, as they are handled by a separate thread.  Reproducing the issue is difficult due to the timing coincidence involved, but in my environment I am presently seeing it at least once a day.  I figured out the sequence described above through addition of my own debug logs.

The relevant code involved in this bug has not changed substantially in libupnp-1.6.10, though I am planning to test against 1.6.10 as well in the near future.

Do you have any input for an elegant fix for this issue?

Thanks,

Chuck Thomason

----------------------------------------------------------------------

Comment By: Chandra (inactiveneurons)
Date: 2011-01-17 03:24

Message:
I submit the patch, problem with permissions?

----------------------------------------------------------------------

Comment By: Chandra (inactiveneurons)
Date: 2011-01-17 03:22

Message:
Attached is a stab at a patch. I've tested it and verified that it doesn't
break anything. It should fix the issue, but I'm not a 100% sure since it's
hard to reproduce.

----------------------------------------------------------------------

Comment By: Marcelo Roberto Jimenez (mroberto)
Date: 2011-01-16 17:03

Message:
Chuck, from your description, it seems to be a matter of controlling the
right resoures with a mutex. I will probably not have enough free time
during this week to analyze the issue carefully, so if you or Chandra or
anyone else comes with a patch before I do, I'll be very glad :)

I consider this one is a show stopper for 1.6.11.

Regards,
Marcelo.

----------------------------------------------------------------------

Comment By: Chandra (inactiveneurons)
Date: 2011-01-15 10:44

Message:
Great find! I think we've come across this issue too, but never quite
managed to track it down. Also, someone cleaned up the ThreadPool API,
cause the last time I took a peek it was a total mess! =)

Anyways, my guess for the fix is that 'currentThreads', which I'm looking
at as 'requiredThreads', should be part of the ThreadPool structure. It
should be decremented whenever totalThreads get decremented. I'm sure I'm
missing something though, these things usually tend to be tricky.

Regards, 
Chandra

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=841026&aid=3158591&group_id=166957