Re: [bailey-developers] Details on handling a write request

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Thu, May 1, 2008 at 12:39 PM, Yonik Seeley <yo...@ap...> wrote:
>  Google's distributed file system maximized link bandwidth by having a
>  chain... one node would send to another node, which would send to
>  another node, etc.  The trick was that it was streamed (a node would
>  start forwarding an update as soon as it started receiving it).
>  That's probably too complex for now, so parallel looks like the right
>  choice.

Yes, let's go with a simpler version first. But we should definitely
keep the chain design in mind.

>  >  - What happens if the coordinator node cannot "complete"
>  >   the write request because it cannot find W nodes that
>  >   handle the write request successfully?
>
>  Good question.  It seems like we should be able to operate in degraded
>  mode, so shouldn't a write succeed if at least one node got it?
>
>  Or perhaps a  W-way write is a feature, but not the default?  Seems
>  like the desired replication factor shouldn't be strongly coupled to
>  the number of nodes that we need to write to for success.

The number W of nodes that we need to "complete" a write and
the replication factor/level N are two separate parameters in the
system configuration and 1 <= W <= N (usually, 1 < W < N).
We need a W-way write to provide fault tolerance for the write.
Maybe we can return a flag indicating <W nodes did the write
and let an application decide whether it wants to redo the write?

>  >  - Last but not the least, a possible performance impact:
>  >   a node can receive the same write request from several
>  >   different nodes around the same time.
>
>  It seems like this should be rare.  If it is rare, we shouldn't do any
>  extra work to handle it.

I'm not sure if this is rare. I will run a test when we experiment
with a more realistic workload.

Cheers,
Ning