|
From: Ning L. <nin...@gm...> - 2008-05-02 20:12:49
|
On Thu, May 1, 2008 at 12:39 PM, Yonik Seeley <yo...@ap...> wrote: > Google's distributed file system maximized link bandwidth by having a > chain... one node would send to another node, which would send to > another node, etc. The trick was that it was streamed (a node would > start forwarding an update as soon as it started receiving it). > That's probably too complex for now, so parallel looks like the right > choice. Yes, let's go with a simpler version first. But we should definitely keep the chain design in mind. > > - What happens if the coordinator node cannot "complete" > > the write request because it cannot find W nodes that > > handle the write request successfully? > > Good question. It seems like we should be able to operate in degraded > mode, so shouldn't a write succeed if at least one node got it? > > Or perhaps a W-way write is a feature, but not the default? Seems > like the desired replication factor shouldn't be strongly coupled to > the number of nodes that we need to write to for success. The number W of nodes that we need to "complete" a write and the replication factor/level N are two separate parameters in the system configuration and 1 <= W <= N (usually, 1 < W < N). We need a W-way write to provide fault tolerance for the write. Maybe we can return a flag indicating <W nodes did the write and let an application decide whether it wants to redo the write? > > - Last but not the least, a possible performance impact: > > a node can receive the same write request from several > > different nodes around the same time. > > It seems like this should be rare. If it is rare, we shouldn't do any > extra work to handle it. I'm not sure if this is rare. I will run a test when we experiment with a more realistic workload. Cheers, Ning |