Re: [Bigdata-developers] Why zookeeper?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

"Flexibility" seems to be in the eyes of the beholder.  I would
characterize the existing mechanism as more adaptive, but less
flexible. That is, you get "flexing", but you lose fine grained
control. I think that adaptivity and fine grained control are mutually
exclusive goals.

The finer grained control has a number of advantages:

* It is deterministic. Services always run on the same host and with
the same configuration every time. Easier to diagnose faults. Easier
to test. Every time the system starts up, it starts up the same way,
and deviations are errors.

* There is no need for locking. That is, I think the locking is needed
as a result of the non-deterministic behavior. (No need for a service
manager to wait to find out if another host's service manager grabbed
a lock to start a service before attempting to start one itself.)

* It allows for matching the distribution of services to heterogeneous
hardware obtained to run them. The operators knew which machines were
purchased to run which services and why, and should be able to specify
that the services run on specific machines.

* If the operator added hardware to a cluster for a specific need, the
operator should be able to specify that the hardware be used to
address the need.

* It allows for more specific control of individual services. (eg. How
would you separate the service directory from the mass storage
directory? How do you configure N data services per host to run on N
independent drives instead of a RAID? How would that perform?)

On the flip side, I am not clear on what the benefits of the
adaptivity or "flexing" really are in this context. Flexing seems more
related to "cloud" environments where hardware is instantly available
but managing persistent data is very difficult. Could you elaborate on
the benefits you perceive from adaptivity?

Removing or separating out the adaptive behavior (and zookeeper, or
limiting zookeeper's use to HA leader election) removes moving parts,
increases visibility and understanding of the code, and improves
maintainability significantly. We would like to see bigdata become
modular, to the point where the service manager (and its use of
zookeeper) can be implemented in its own optional module.

Is the starting of each of the services individually from the command
line or script possible without need for zookeeper (if it really is
limited to the services manager service)? If so, then this isn't a
second mechanism at all.

In either case, supporting a second service starting arrangement seems
like a small price relative to the simplicity gained.

Fred

Re: [Bigdata-developers] Why zookeeper?

Fast, scalable, robust graph database platform

Re: [Bigdata-developers] Why zookeeper?