[Bigdata-developers] Why zookeeper?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Why does bigdata use zookeeper?  Isn't the bigdata system more complex
with zookeeper that it would be without? Or is zookeeper going away as
part of the "share nothing" effort?

I have the impression that zookeeper is used for three things:

Zookeeper appears to be used to communicate information between
bigdata services. (Like spies passing messages in dead drops?)
Wouldn't it be simpler and more straight forward for services to ask
each other for the information they need?

Zookeeper appears to be used to persist data about each service. Why
wouldn't each service persist its own state, privately? What is the
benefit of allowing any service to see another service's persisted
state?

Zookeeper has something to do with locks and synchronization? What
needs to be synchronized? For example, ticket #111 shows that
zookeeper is involved in preventing a DataService from starting if a
TransactionService is not running. But the DataService should be
robust enough to operate to some extent even though the
TransactionService fails or becomes unreachable. And if the
DataService is robust enough to wait for the TransactionService to
come online, why make this artificial synchronization?

Could zookeeper be evolved out of the system, simplifying the system?

Thank you,

Fred

[Bigdata-developers] Why zookeeper?

Fast, scalable, robust graph database platform

[Bigdata-developers] Why zookeeper?