|
From: Yonik S. <yo...@ap...> - 2008-05-01 16:39:50
|
On Tue, Apr 29, 2008 at 11:21 AM, Ning Li <nin...@gm...> wrote: > Previously, we decided that: > > - A write request is sent to multiple nodes, all of whose > ranges must include the document. > > - When a node receives a write request of a document with > revision R, it checks that revision R is newer than the > current revision stored/indexed on the node. Then the node > stores and indexes the new revision in memory and put a > log entry in the log. > > - A W-way write means at least W number of nodes have to > successfully serve the write request for the write to be > considered "complete". 1 < W <= N. W > 1 is to provide > fault tolerance to write operations. > > > Here are the details: > > - The node that receives a write request is the coordinator > of the write request. It makes sure the write request is > also handled on W-1 other nodes. > > - Should the coordinator node send the write request to > W-1 other nodes in sequence or in parallel? Preferably > in parallel? Google's distributed file system maximized link bandwidth by having a chain... one node would send to another node, which would send to another node, etc. The trick was that it was streamed (a node would start forwarding an update as soon as it started receiving it). That's probably too complex for now, so parallel looks like the right choice. > - When the write request fails on a node, should the > coordinator node retry that node or immediately try > some other node? > > - What happens if the coordinator node cannot "complete" > the write request because it cannot find W nodes that > handle the write request successfully? Good question. It seems like we should be able to operate in degraded mode, so shouldn't a write succeed if at least one node got it? Or perhaps a W-way write is a feature, but not the default? Seems like the desired replication factor shouldn't be strongly coupled to the number of nodes that we need to write to for success. > On the one hand, > we cannot undo the write request from the nodes that > succeeded because we cannot increase user's revision. > On the other hand, the nodes that succeeded may > propagate the write request to more nodes so the write > request may eventually be "completed". So users should > be aware that even if a write request is "incomplete", > it may or may not be "completed" eventually. > > - When a node receives a write request whose revision > is older than that on the node, the node returns success > without doing anything. Should a user be notified that > there is a newer revision encountered? > > - Last but not the least, a possible performance impact: > a node can receive the same write request from several > different nodes around the same time. It seems like this should be rare. If it is rare, we shouldn't do any extra work to handle it. -Yonik > When it checks > its database, the write request is considered new, then > it parses and process the request, several times, and > in the end finds out that it's the same request. What is > a good strategy to minimize this situation? > > What do you think? > > Cheers, > Ning |