|
From: Fred O. <fko...@gm...> - 2010-07-20 17:20:11
|
On Tue, Jul 20, 2010 at 12:38 PM, Bryan Thompson <br...@sy...> wrote: > Fred, > > We should have a separate discussion concerning how bigdata allocates and start services. I am a bit crushed for time right now, but maybe we could take that up next week? I understand you'll be unavailable. Please pick up the conversation when you can. > We only use global synchronous locks at the moment in service startup logic, HA, locking out different masters for the same distributed job. However, I think that specific applications of bigdata may well want to use global synchronous locks to make operations such as life cycle management of a triple or quads store atomic. > > Concerning your example, service #5 is either running or it is not, but we also need to know whether or not it has been created. Bigdata services have huge amounts of persistent state. We can not simply substitute another service as a replacement for #5 if it should fail. Instead we have to recruit a new service, synchronize it with the state of the quorum to which #5 belonged, and then bring the new service on atomically when it is caught up with the quorum. This "hot spare" allocation process could take hours to bring a newly recruited node into full synchronization. We speed that up by working backwards in history so the data service quickly has a view of the current commit point and then builds up its history over time. I specifically meant to disregard HA for this conversation, as bigdata has a long history with zookeeper outside of HA. HA is a much longer conversation, and I don't want to confuse tomorrow's issues with today's code. With that said, I don't understand what "whether or not it has been created" means. DataService#5 could be called "created" when the operator wrote (DataService, host H, persistence directory D) in a config file on host H and created directory D with #5 in it. After that, the service is either running, or not running. Please explain where global synchronous locks are needed in service startup logic? Fred > > Bryan > >> -----Original Message----- >> From: Fred Oliver [mailto:fko...@gm...] >> Sent: Tuesday, July 20, 2010 10:59 AM >> To: Bryan Thompson >> Cc: Bigdata Developers >> Subject: Re: [Bigdata-developers] Why zookeeper? >> >> On Tue, Jul 20, 2010 at 9:24 AM, Bryan Thompson >> <br...@sy...> wrote: >> >> > In this regard it is more flexible than an Jini system with >> support for creating global synchronous locks. >> >> I believe that global synchronous locks are more harmful than >> helpful, in general. That's why I would like to know how >> global synchronous locks help bigdata (not that zookeeper is >> a bad way to do it if necessary). >> >> > Services store their configuration state locally for >> restart in the service directory. However, we also need to >> know which persistent services exist, even if they are not >> running at the moment. That information is captured in >> zookeeper. For example, if the target #of logical data >> services (LDS) is 10, then we do not want to start a new data >> service if one goes down because the persistent state of the >> data service can be terabytes of files on local disks and is >> part of the semantics of its service "identity". >> >> I'm not clear about the problem you are describing. Say we >> have a DataService #5 configured one host H with a >> persistence directory D containing its UUID, journals, >> indices, etc. It is either running and registered with the >> lookup service (with the UUID) or not. If the service >> starter/manager on host H needs this service to be running >> and is not registered, start it. (There is a strictly local >> problem of starting a duplicate java process because of race >> conditions, but the service itself should detect and prevent >> that.) I don't see how global synchronization is involved here. >> >> Can you give another example of the need for global >> synchronization (excluding HA) or point out what I am missing? >> >> Fred >> |