JmDNS / Bugs / #82 waitForCancelled and waitForAnnounced can cause bad behavior

Jason LeBrun - 2011-01-11

Much happier with thise one.

evented-jmdns-v2.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jason LeBrun - 2011-01-11

A few more tweaks so that cleanup happens in the correct place.

evented_jmdns_v3.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pierre Frisch - 2011-01-11

Have you tried running the unit test?

Pierre

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pierre Frisch - 2011-01-11

The easiest is to ru:n mvn test

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jason LeBrun - 2011-01-11

No I haven't, I'll do that now.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pierre Frisch - 2011-01-14

Ok I have an implementation that appears too work and is quite a bit more efficient. Could you try it?

Thanks

Pierre

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pierre Frisch - 2011-01-14

status: open-accepted --> pending-accepted
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2011-01-14

Well, this is better than the spinwait, but it will still leak threads if connectivity goes away during one of the semaphore waits.

My goal in the implementation I made was to never block execution waiting for a condition to be met. So, in the case where the client discards the JmDNS object, it will always be cleaned up and collected when the client discards it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jason LeBrun - 2011-01-14

status: pending-accepted --> open-accepted
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jason LeBrun - 2011-01-14

Sorry, that last comment was me. Did not realize that I wasn't logged in.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pierre Frisch - 2011-01-14

Why would it leak? There is a timeout and if the timeout expires the Thread is released. This is why I am using Semaphores and not wait/notify. I may have missed something could you be more explicit?

There are time we need to block the thread. In particular when we are registering services we need to wait for JmDNS to be announced before we can probe for the service and announce it. The previous implementation of JmDNS was blocking until it was announced and I added that so that we can start listening while JmDNS starts but then and publish need to wait.

Pierre

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jason LeBrun - 2011-01-14

Most of the internal calls to waitForAnnounced/waitForCanceled specify 0, which is effectively an infinite timeout. So these threads will block forever if they can't finish.

The thread does not need to be blocked to successfully wait for a particular condition before doing something. If a callback-based method is used, then you simply call the callbacks when the particular announced state is reached.

In my solution, for example, if you called registerService and JmDNS was not announced,the desired service registration was queued. When the annouce state was finally reach, the queue was processed and the service registrations were announced. I used a single callback per DNSStatefulObject that decided what to do next based on the current state of the object once a "settled" state was reached. This allowed it to process pending operations in the right order, for example, ensuring that an attempt was made to unregister services before closing everything.

Furthermore, the client doesn't need to block, since there are serviceAnnounced/serviceResolved/serviceRemoved callback listeners that can be registered to tell the client that the service is now registered and that it can proceed with doing what it needs to do after service registration.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pierre Frisch - 2011-01-14

My mistake I thought I added a timeout on all of them. Fixed.

The problem is that you redefining the API and that is why the unit tests are failing. If we make all the call asynchronous this will break a lot of existing code. I am not saying this is a bad solution but this will change things quite radically. I still like the idea but this is BIG change as to how JmDNS works with consequences on the client code.

Pierre

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jason LeBrun - 2011-01-14

I agree completely about the API change. When I ran your unit tests and they all failed, I quickly learned why, and realized that it wouldn't be shippable since it could potentially break existing clients. I was looking at remedying that, but then got distracted and hadn't finished before you checked in your other patch.

My planned remedy was to rename my versions of the now non-blocking API calls, and have the original method names still exhibit blocking behavior. Then the original blocking calls could just call the new calls (registerServiceNB or something) and then after calling it, block until the desired condition occurs.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jason LeBrun - 2011-01-19

Even with the timeout, I still end up with threads blocked forever at the acquire() step if connectivity is lost.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pierre Frisch - 2011-01-19

Do you have a thread dump? This is not supposed to happen.

Pierre

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jason LeBrun - 2011-01-19

It happens if I call close() from another thread while the first thread is in the middle of waiting for a registerService() call to finish. The registerService() call blocks forever:

at java.util.concurrent.Semaphore.acquire(Semaphore.java:284)

Should the acquire() be a tryAcquire(timeout) ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pierre Frisch - 2011-01-19

The timeout is simply not working because the timer is started after the semaphore acquisition and therefore never executed…

We should always write unit test…

Pierre

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pierre Frisch - 2011-01-19

status: open-accepted --> pending-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jason LeBrun - 2011-01-19

status: pending-fixed --> open-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jason LeBrun - 2011-01-19

Why are you using a whole separate timer, instead of just using the tryAcquire(timeout) call of Semaphores?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pierre Frisch - 2011-01-19

status: open-fixed --> pending-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pierre Frisch - 2011-01-19

You are absolutely right. Done

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2011-01-20

It's working well. I'm attaching a patch to fix one more minor thing, which is that you can get into a state where the timers aren't cleaned up if you close() while the service is in the middle of recovering. I fixed this by creating a new closing() and closed() states, which will be helpful for other things too, I think.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jason LeBrun - 2011-01-20

Add closing and closed states so that we can always clean up properly when close is issued.

handle_close.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

waitForCancelled and waitForAnnounced can cause bad behavior

Group

Searches

Help

#82 waitForCancelled and waitForAnnounced can cause bad behavior

Discussion