|
From: Fred O. <fko...@gm...> - 2010-07-20 03:29:58
|
Why does bigdata use zookeeper? Isn't the bigdata system more complex with zookeeper that it would be without? Or is zookeeper going away as part of the "share nothing" effort? I have the impression that zookeeper is used for three things: Zookeeper appears to be used to communicate information between bigdata services. (Like spies passing messages in dead drops?) Wouldn't it be simpler and more straight forward for services to ask each other for the information they need? Zookeeper appears to be used to persist data about each service. Why wouldn't each service persist its own state, privately? What is the benefit of allowing any service to see another service's persisted state? Zookeeper has something to do with locks and synchronization? What needs to be synchronized? For example, ticket #111 shows that zookeeper is involved in preventing a DataService from starting if a TransactionService is not running. But the DataService should be robust enough to operate to some extent even though the TransactionService fails or becomes unreachable. And if the DataService is robust enough to wait for the TransactionService to come online, why make this artificial synchronization? Could zookeeper be evolved out of the system, simplifying the system? Thank you, Fred |