Re: [Zookeeper-user] [Bug?] Notification not guaranteed

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Thanks Ben. As always, your answer is pretty reasonable and I agree with
most of the points.

However I strongly disagree with:
2) a permanent watch is a watch that spans across multiple changes
within the same connection. I'm not requesting anything else.
4) How do subscriptions make it different from permanent watches? Maybe,
there is a feature I'm not aware of. Besides, for a permanent watch, it
should be cleared only when session expires or client disconnects, that
would be the semantic and a client can detect it easily.
5) There are two types of users, those who don't want to deal with
difficult problems (I guess it's a majority), and those who want to get
as much from the system at the minimum cost, considering a risk it
brings. I think a flexible API should allow both, rather then enforcing
correctness on the user side just for the sake of correctness. I just
like to have a choice.

Just to end up this discussion, please suggest the most efficient
solution (without persistent watches, you can use subscriptions) that
minimizes number of network traffic between client and server for the
following scenario:
1) A client is interested in some activity of another application and
it's critical to get *ALL* changes as long as the connection to ZK
doesn't break
2) The client operates on the boundary the network bandwidth limit

Thanks,
Bart

Bart

-----Original Message-----
From: Benjamin Reed [mailto:br...@ya...] 
Sent: Friday, April 25, 2008 4:07 PM
To: Bartlomiej Niechwiej
Cc: Benjamin Reed; Jacob Levy; Ted Dunning;
zoo...@li...
Subject: Re: [Zookeeper-user] [Bug?] Notification not guaranteed

exists() is a special case where the watch event does indeed have the
data. 
But it only buys you want you need in the absences of failures. If you
need 
to reconnect to another server, you still miss events.

The reasons for not having permanent watches are practical:
1) Except for exists() you can get the same information more efficiently
with 
current watches and version tracking. Even in your example if the alive
node 
is really thrashing so fast that the tracking server cannot request fast

enough to keep up with all the events, do you really need the missed
events? 
The events you are able to keep up with are enough to indicate problems.
You 
could actually get fine grained counts by making alive a directory and
using 
the SEQUENCE flag. That way you just compare the current sequence of the
file 
with the previous sequence you saw.
2) Even for exists() it doesn't work across server connections, since
watches 
can be missed.
3) Because of 2) an application cannot reliably count on permanent
watches.
4) Applications would need to be responsible for proactively cleaning up

permanent watches. (Which means they probably wouldn't.)
5) Most importantly the ZooKeeper API is designed to encourage correct
usage. 
We don't include the data that is changed in the watch event
specifically 
because our initial users tried to take advantage of that data and
always 
ended up with errors in their code. Permanent watches can also induce
such 
errors.

So, really it's the practical issues that are behind our aversion to
permanent 
watches. ZooKeeper needs to provide clean well understood semantics. Our

current watches do this and subscribe would too, but something in the
middle 
is likely to induce errors and misunderstandings.

ben

On Friday 25 April 2008 15:29:18 Bartlomiej Niechwiej wrote:
> Ben, I think I gave a clear example that I was interesting in
> notifications about the change, not about the data being changed. In
> other words, the presence or absence is my data, a boolean.  In the
> scenario I described, zookeeper cannot provide a reliable way of
giving
> me what I need. That's it.
>
> Why it is so hard to provide a permanent server side watches? What is
> the problem with that? Is it that we don't want this functionality
> because it doesn't make sense or is it just a theoretical discussion?
>
> You suggest subscriptions mechanism, which is way too much expensive
> versus what I propose, and for simple cases like the one described,
you
> would have to end up spending too much ZK resources.
>
> B.
>
> -----Original Message-----
> From: Benjamin Reed [mailto:br...@ya...]
> Sent: Friday, April 25, 2008 2:29 PM
> To: Jacob Levy; Ted Dunning; Bartlomiej Niechwiej; Benjamin Reed
> Cc: zoo...@li...
> Subject: Re: [Zookeeper-user] [Bug?] Notification not guaranteed
>
> Here is the issue: are you watching for changes or just the
notification
> that something changed? Watches are really about notification of
> changes. Imagine the following execution:
>
> time 0: set /a to value0 (now version 1)
> time 1: set /a to value1 (now version 2)
> time 2: set /a to value2 (now version 3)
> time 3: set /a to value3 (now version 4)
> time 4: set /a to value4 (now version 5)
>
> If we had a permanent watch a client watching /a would get:
>
> time 0: getData(/a, permanent)
> time 1: getData returns value1 version 2
> time 2: /a changed
> time 3: /a changed
> time 4: /a changed
>
> With our current watches you could see something like:
>
> time 0: getData(/a, true)
> time 1: getData returns value1 version 2
> time 2: /a changed
> time 3: getData(/a, true)
> time 4: getData returns value4 version 5
>
> Now note, at the client you can change the above into a permanent
watch
> by generating locally missed events by calculating the number of
missed
> changes by subtracting the version numbers:
>
> time 0: getData(/a, true)
> time 1: getData returns value1 version 2
> time 2: /a changed
> time 3: getData(/a, true)
> time 4: getData returns value4 version 5 by looking at the version
> numbers we see that we missed 2 events, so generate now
> time 4: /a changed (locally generated)
> time 4: /a changed (locally generated)
>
> There is a slight additional latency for the version 4 change, but in
> some sense we have compressed the traffic (the collapsing that Ted
> mentioned).
>
> Now, in the end this is all silly. If you are really watching /a in
this
> way, you are probably more interested in the actual data, something
that
> the watch doesn't give you. In that case you usually want the latest
> value. (This is what ZooKeeper makes easy right now.) or all the
> intermediate values. Watch events don't have values, so the permanent
> watches don't help with intermediate values. Subscribe events would
push
> values. Subscribe actually gives you something you cannot get today.
>
> This is a repeat of what is said on
> http://zookeeper.wiki.sourceforge.net/SubscribeMethod Does this help
> clarify the wiki any better?
>
> ben
>
>
> ----- Original Message ----
> From: Jacob Levy <jy...@ya...>
> To: Ted Dunning <tdu...@ve...>; Bartlomiej Niechwiej
> <ba...@ya...>; Benjamin Reed <br...@ya...>
> Cc: zoo...@li...
> Sent: Friday, April 25, 2008 1:12:57 PM
> Subject: Re: [Zookeeper-user] [Bug?] Notification not guaranteed
>
> In case noone answered your question yet:
>
> A permanent watch (subscription) will guarantee that the client sees
> EVERY change in the thing being watched after the time the permanent
> watch is established. A one time watch that is reasserted every time
you
> read is different, since it can miss events between the time that the
> watch fired and is reasserted.
>
> --Jacob
>
>
> -----Original Message-----
> From: zoo...@li...
> [mailto:zoo...@li...] On Behalf Of Ted
> Dunning
> Sent: Friday, April 25, 2008 11:31 AM
> To: Bartlomiej Niechwiej; Benjamin Reed
> Cc: zoo...@li...
> Subject: Re: [Zookeeper-user] [Bug?] Notification not guaranteed
>
>
> Bartlomiej,
>
> How is a watch that is always reasserted on every read different from
a
> permanent watch?  The client side implementation has the virtue that
it
> collapses multiple changes if the client goes away or gets busy.
>
> Is it just a client API issue?