You can subscribe to this list here.
| 2008 |
Jan
|
Feb
(60) |
Mar
(65) |
Apr
(44) |
May
(6) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|---|
|
From: Ning L. <nin...@gm...> - 2008-03-19 20:03:54
|
On Wed, Mar 19, 2008 at 2:06 PM, Doug Cutting <cu...@ap...> wrote: > I had a slightly different idea. I thought that the id string would be > the external id provided by the application that we return with hits, > e.g., a uri, a filename, etc. We'd also have a numeric 'position' value > that places the document on the ring. The position would, by default, > be the hash of the id, but an application might override that. It would > be a bug for an application to ever provide different positions for the > same id. Originally, I was thinking simply using the application-specified external id as its 'position' value on the ring. We'd have one value instead of two. No need to check if different positions are ever provided for the same id. The ring distribution won't be uniform in this case. But we have to deal with this case anyway. So the main downside I see is the performance cost with strings - computation, memory... That's why I'm fine with a separate 'position' value. > I'd imagined that positions would be longs, but Yonik has argued that > they might as well be ints, and I can't think why they couldn't, if > we're going to keep the string id too. That makes the default > implementation in Java much easier, since it can be hashCode(). I'm not insisting on longs. But here is what I reasoned. :) I imagined a good number of the applications which would use Bailey would be similar to an email system - the application would provide the 'position' values so that a search on a fraction of all the documents spans a relatively small number of nodes. Let's use Yonik's suggestion to assign such 'position' values: > Of course, fixing my bug it would be (username.hashCode() << 29) | > (id.hashCode() >>> 3) One user may have one document. Another may have a lot. Is 29 bits for username enough? Maybe. But is 3 bits for the documents of a user enough? That means a user's documents cannot span more than 8 nodes. Maybe I over-thought the problem. :) Ning |
|
From: Doug C. <cu...@ap...> - 2008-03-19 20:00:55
|
Ning, This is looking great! A few questions: Should search results indicate the ranges actually searched? That way, if the client's Mapper is out of date, the client can detect and repair this. To repair, given a set of search results, we need to check for gaps, then figure out what to re-query to fill those gaps. Should this be done through the Ring API? Will host ids ever be other than an ipaddress+port? Should we represent them that way instead? Hosts are not on the ring, so we don't need a numeric value here. Will node ids ever be other than a large (128 bit?) numeric value? They need to be unique, but should be host-independent. E.g., it should be possible to copy a node's data to a new host and have that new host start serving that data. Doug |
|
From: Doug C. <cu...@ap...> - 2008-03-19 19:06:05
|
Ning Li wrote: > On Mon, Mar 17, 2008 at 7:28 PM, Yonik Seeley <yo...@ap...> wrote: >> Or did you mean represent the hash as a String? > > Yes. :) I had a slightly different idea. I thought that the id string would be the external id provided by the application that we return with hits, e.g., a uri, a filename, etc. We'd also have a numeric 'position' value that places the document on the ring. The position would, by default, be the hash of the id, but an application might override that. It would be a bug for an application to ever provide different positions for the same id. I'd imagined that positions would be longs, but Yonik has argued that they might as well be ints, and I can't think why they couldn't, if we're going to keep the string id too. That makes the default implementation in Java much easier, since it can be hashCode(). Doug |
|
From: Yonik S. <yo...@ap...> - 2008-03-18 23:43:08
|
On Tue, Mar 18, 2008 at 7:32 PM, Ning Li <nin...@gm...> wrote: > On Tue, Mar 18, 2008 at 2:57 PM, Yonik Seeley <yo...@ap...> wrote: > > If one wants each user in an email system to span a maximum of 1/8th > > of the ring then the > > hash could be hash = (username.hashCode() << 29) | (id.hashCode() >> 3) > > This could work. Of course, fixing my bug it would be (username.hashCode() << 29) | (id.hashCode() >>> 3) :-) > > Or if the number of emails per user is small, hash = username.hashCode() > > We cannot really assume that, right? :) Well, any assumptions like that are up to the specific application if they want to try their own partitioning. I think most just use the default hash. > Let's use long for now. I'm still not sure I see the value over an int hash, but I guess it's not a big deal as long as we don't have to index it or use the FieldCache for it. That leaves calculating it on the fly on the node when needed, or storing it in a quickly accessible manner (payload or upcomming column store) -Yonik |
|
From: Ning L. <nin...@gm...> - 2008-03-18 23:32:53
|
On Tue, Mar 18, 2008 at 2:57 PM, Yonik Seeley <yo...@ap...> wrote: > If one wants each user in an email system to span a maximum of 1/8th > of the ring then the > hash could be hash = (username.hashCode() << 29) | (id.hashCode() >> 3) This could work. > Or if the number of emails per user is small, hash = username.hashCode() We cannot really assume that, right? :) Let's use long for now. Ning |
|
From: Ning L. <nin...@gm...> - 2008-03-18 23:29:32
|
On Tue, Mar 18, 2008 at 6:20 PM, <ni...@us...> wrote: > Change the Node protocols to the Host protocols. NodeInfo contains NodeID and Range. I have this distinction between NodeInfo and Pair<NodeID, Range> in that, NodeInfo implies the range is the partition that the node serves, while Pair<NodeID, Range> simply pairs a node and a range (e.g. the range could be a subset of the range which the node serves). Now I think this distinction is not really necessary and I should just get rid of Pair<NodeID, Range>? Ning |
|
From: <ni...@us...> - 2008-03-18 23:21:08
|
Revision: 13
http://bailey.svn.sourceforge.net/bailey/?rev=13&view=rev
Author: ning_li
Date: 2008-03-18 16:21:03 -0700 (Tue, 18 Mar 2008)
Log Message:
-----------
Change the Node protocols to the Host protocols - include the change to TestSimpleDb.
Modified Paths:
--------------
trunk/src/test/org/apache/bailey/TestSimpleDb.java
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <ni...@us...> - 2008-03-18 23:20:03
|
Revision: 12
http://bailey.svn.sourceforge.net/bailey/?rev=12&view=rev
Author: ning_li
Date: 2008-03-18 16:20:03 -0700 (Tue, 18 Mar 2008)
Log Message:
-----------
Change the Node protocols to the Host protocols.
Modified Paths:
--------------
trunk/src/java/org/apache/bailey/ddb/Client.java
trunk/src/java/org/apache/bailey/ddb/ClientToMasterProtocol.java
trunk/src/java/org/apache/bailey/ddb/Master.java
trunk/src/java/org/apache/bailey/ddb/NodeID.java
trunk/src/java/org/apache/bailey/ddb/NodeInfo.java
trunk/src/java/org/apache/bailey/ddb/NodeStatus.java
trunk/src/java/org/apache/bailey/ddb/Range.java
trunk/src/java/org/apache/bailey/ddb/Ring.java
trunk/src/java/org/apache/bailey/ddb/simple/SimpleClient.java
trunk/src/java/org/apache/bailey/ddb/simple/SimpleMaster.java
Added Paths:
-----------
trunk/src/java/org/apache/bailey/ddb/ClientToHostProtocol.java
trunk/src/java/org/apache/bailey/ddb/Host.java
trunk/src/java/org/apache/bailey/ddb/HostCommand.java
trunk/src/java/org/apache/bailey/ddb/HostID.java
trunk/src/java/org/apache/bailey/ddb/HostStatus.java
trunk/src/java/org/apache/bailey/ddb/HostToHostProtocol.java
trunk/src/java/org/apache/bailey/ddb/HostToMasterProtocol.java
trunk/src/java/org/apache/bailey/ddb/Mapper.java
trunk/src/java/org/apache/bailey/ddb/NodeHostMap.java
trunk/src/java/org/apache/bailey/ddb/simple/SimpleHost.java
trunk/src/java/org/apache/bailey/util/SetValuedMap.java
Removed Paths:
-------------
trunk/src/java/org/apache/bailey/ddb/ClientToNodeProtocol.java
trunk/src/java/org/apache/bailey/ddb/Node.java
trunk/src/java/org/apache/bailey/ddb/NodeCommand.java
trunk/src/java/org/apache/bailey/ddb/NodeToMasterProtocol.java
trunk/src/java/org/apache/bailey/ddb/NodeToNodeProtocol.java
trunk/src/java/org/apache/bailey/ddb/simple/SimpleNode.java
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: Yonik S. <yo...@ap...> - 2008-03-18 19:57:03
|
On Tue, Mar 18, 2008 at 11:01 AM, Ning Li <nin...@gm...> wrote: > On Mon, Mar 17, 2008 at 7:28 PM, Yonik Seeley <yo...@ap...> wrote: > > Or did you mean represent the hash as a String? > > Yes. :) Ah, OK... so like a variable length int. The ring management and passing of ranges with Strings would be easy enough, but depending on what needs to be done on a node, it may be less efficient. For example, to find a range of hashes (say to split of a piece for rebalancing) we would probably use either the FieldCache, or a colum stored field (upcoming) in Lucene, both of which would be more efficient as an int. > Converting a string to a number while maintaining the string order, > on the other hand, is difficult. If one wants each user in an email system to span a maximum of 1/8th of the ring then the hash could be hash = (username.hashCode() << 29) | (id.hashCode() >> 3) Or if the number of emails per user is small, hash = username.hashCode() -Yonik |
|
From: Ning L. <nin...@gm...> - 2008-03-18 15:01:11
|
On Mon, Mar 17, 2008 at 7:28 PM, Yonik Seeley <yo...@ap...> wrote: > Or did you mean represent the hash as a String? Yes. :) Converting a number to a string while maintaining the number order is easy when we know the min and the max numbers. Converting a string to a number while maintaining the string order, on the other hand, is difficult. Ning |
|
From: Yonik S. <yo...@ap...> - 2008-03-18 00:28:25
|
On Mon, Mar 17, 2008 at 7:39 PM, Ning Li <nin...@gm...> wrote: > Why cann't we use the string for a point on the ring? Each document > can have a unique point and all the documents have an order. Using an int/long with a good hash makes document distribution uniform on the ring, and thus makes it much easier for the master to assign ranges to nodes. For example, for a two node system the master could assign NodeA hashes 0x00000000-0x7FFFFFFF and NodeB hashes 0x80000000-0xFFFFFFFF, knowing nothing about the document ids. How would this work if one uses the String Ids? Or did you mean represent the hash as a String? -Yonik |
|
From: Ning L. <nin...@gm...> - 2008-03-17 23:39:22
|
On Mon, Mar 17, 2008 at 4:12 PM, Doug Cutting <cu...@ap...> wrote: > Ning Li wrote: > > None of the host related protocols are there yet. We should decide > > a few things before we add those? > > 1 Should nodes run as threads within a host process or as separate > > processes on a host? Do we still need a host process in the latter case? > > I think we'll at least initially implement these as threads. I don't > see many advantages of making these separate processes, and it will > complicate lots of things. They have their pros and cons. But yes, let's start with nodes as threads. > > 2 In the former case, do we have HostToMasterProtocol and > > ClientToHostProtocol, or do we have NodeToMasterProtocol and > > ClientToNodeProtocol, or both sets? > > With threads, we don't need both Host and Node protocols, so, unless we > think it will make implementation cleaner, we should skip this. Will > each node have its own thread, or might a single thread host per send > hearbeats for all nodes, with another thread fetching and applying > updates, etc. (Since Lucene itself now multithreads updates, we might > not want all nodes updating in parallel.) > > Given that, I think I'd opt for just HostToMasterProtocol and > ClientToHostProtocol, with nodes in parameters. Does that sound > reasonable to you? This sounds reasonable. We'd also have a logger thread (or threads) for all the nodes. I still think we should allow multiple update threads? I'll change Node protocols to Host protocols. > > I changed range to be <long, long> for now. However, a document > > id is of type String, so we have to convert from String to long. It is > > hard if we want to maintain the order of the ids after we convert > > them to longs. > > Right. We need to decide on the bits per point on the ring. Is 64 > enough? We also need a Document method to get its ring coordinate. > This will by default be a hash of the ID string, but we should store it > in a separate field, so that folks can someday specify it explicitly. Why cann't we use the string for a point on the ring? Each document can have a unique point and all the documents have an order. Ning |
|
From: Yonik S. <yo...@ap...> - 2008-03-17 22:01:32
|
On Mon, Mar 17, 2008 at 5:12 PM, Doug Cutting <cu...@ap...> wrote: > Ning Li wrote: > > I changed range to be <long, long> for now. However, a document > > id is of type String, so we have to convert from String to long. It is > > hard if we want to maintain the order of the ids after we convert > > them to longs. > > Right. We need to decide on the bits per point on the ring. Is 64 > enough? Wouldn't 32 bits be more than enough if it's just used for partitioning (as opposed to using it for a unique id)? -Yonik |
|
From: Doug C. <cu...@ap...> - 2008-03-17 21:13:00
|
Ning Li wrote: > None of the host related protocols are there yet. We should decide > a few things before we add those? > 1 Should nodes run as threads within a host process or as separate > processes on a host? Do we still need a host process in the latter case? I think we'll at least initially implement these as threads. I don't see many advantages of making these separate processes, and it will complicate lots of things. > 2 In the former case, do we have HostToMasterProtocol and > ClientToHostProtocol, or do we have NodeToMasterProtocol and > ClientToNodeProtocol, or both sets? With threads, we don't need both Host and Node protocols, so, unless we think it will make implementation cleaner, we should skip this. Will each node have its own thread, or might a single thread host per send hearbeats for all nodes, with another thread fetching and applying updates, etc. (Since Lucene itself now multithreads updates, we might not want all nodes updating in parallel.) Given that, I think I'd opt for just HostToMasterProtocol and ClientToHostProtocol, with nodes in parameters. Does that sound reasonable to you? > I changed range to be <long, long> for now. However, a document > id is of type String, so we have to convert from String to long. It is > hard if we want to maintain the order of the ids after we convert > them to longs. Right. We need to decide on the bits per point on the ring. Is 64 enough? We also need a Document method to get its ring coordinate. This will by default be a hash of the ID string, but we should store it in a separate field, so that folks can someday specify it explicitly. Doug |
|
From: <cu...@us...> - 2008-03-17 20:47:10
|
Revision: 11
http://bailey.svn.sourceforge.net/bailey/?rev=11&view=rev
Author: cutting
Date: 2008-03-17 13:47:09 -0700 (Mon, 17 Mar 2008)
Log Message:
-----------
Add NodeStatus class, represent load as a float.
Modified Paths:
--------------
trunk/src/java/org/apache/bailey/ddb/Master.java
trunk/src/java/org/apache/bailey/ddb/NodeToMasterProtocol.java
Added Paths:
-----------
trunk/src/java/org/apache/bailey/ddb/NodeStatus.java
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: Ning L. <nin...@gm...> - 2008-03-17 14:55:32
|
On Fri, Mar 14, 2008 at 5:04 PM, Doug Cutting <cu...@ap...> wrote: > Heartbeats should probably include the range/version currently > searchable. It should also report "load", perhaps its average response > time. I added a couple of things, but more will be needed as we move along. > When creating a new node, a host should ask the master what its id > should be, and the master should allocate new nodes to areas of the ring > that have a heavy load. Yes, the master will make that decision. > The master should also be able to give directives to hosts, indicating > that a node should be dropped (since it is in a cool area of the ring). > Then the host should ask for a new id to replace this. Hosts will be > configured to run a particular number of nodes. > Should we have a HostToMaster protocol in addition to a NodeToMaster > protocol, or should these be the same? None of the host related protocols are there yet. We should decide a few things before we add those? 1 Should nodes run as threads within a host process or as separate processes on a host? Do we still need a host process in the latter case? 2 In the former case, do we have HostToMasterProtocol and ClientToHostProtocol, or do we have NodeToMasterProtocol and ClientToNodeProtocol, or both sets? 3 In the latter case, we should have NodeToMasterProtocol and ClientToNodeProtocol. And HostToMasterProtocol if we have a host process. > Ranges might be <long,long> rather than <String,String>? I changed range to be <long, long> for now. However, a document id is of type String, so we have to convert from String to long. It is hard if we want to maintain the order of the ids after we convert them to longs. Cheers, Ning |
|
From: <ni...@us...> - 2008-03-17 14:19:25
|
Revision: 10
http://bailey.svn.sourceforge.net/bailey/?rev=10&view=rev
Author: ning_li
Date: 2008-03-17 07:19:28 -0700 (Mon, 17 Mar 2008)
Log Message:
-----------
Rename the protocols. Add a synchronized implementation for the protocols.
Modified Paths:
--------------
trunk/src/java/org/apache/bailey/ddb/Client.java
trunk/src/java/org/apache/bailey/ddb/Master.java
trunk/src/java/org/apache/bailey/ddb/Node.java
trunk/src/java/org/apache/bailey/ddb/Range.java
trunk/src/java/org/apache/bailey/ddb/Ring.java
trunk/src/java/org/apache/bailey/util/Pair.java
trunk/src/test/org/apache/bailey/Generator.java
Added Paths:
-----------
trunk/src/java/org/apache/bailey/ddb/ClientToMasterProtocol.java
trunk/src/java/org/apache/bailey/ddb/ClientToNodeProtocol.java
trunk/src/java/org/apache/bailey/ddb/NodeCommand.java
trunk/src/java/org/apache/bailey/ddb/NodeID.java
trunk/src/java/org/apache/bailey/ddb/NodeInfo.java
trunk/src/java/org/apache/bailey/ddb/NodeToMasterProtocol.java
trunk/src/java/org/apache/bailey/ddb/NodeToNodeProtocol.java
trunk/src/java/org/apache/bailey/ddb/simple/
trunk/src/java/org/apache/bailey/ddb/simple/SimpleClient.java
trunk/src/java/org/apache/bailey/ddb/simple/SimpleMaster.java
trunk/src/java/org/apache/bailey/ddb/simple/SimpleNode.java
trunk/src/test/org/apache/bailey/TestSimpleDb.java
Removed Paths:
-------------
trunk/src/java/org/apache/bailey/ddb/ClientMasterProtocol.java
trunk/src/java/org/apache/bailey/ddb/ClientNodeProtocol.java
trunk/src/java/org/apache/bailey/ddb/NodeMasterProtocol.java
trunk/src/java/org/apache/bailey/ddb/NodeNodeProtocol.java
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: Doug C. <cu...@ap...> - 2008-03-14 22:04:36
|
ni...@us... wrote: > Skeleton for the "distributed" system using threads per node. Does it look all right? A great start! Thanks! A few thoughts: Heartbeats should probably include the range/version currently searchable. It should also report "load", perhaps its average response time. When creating a new node, a host should ask the master what its id should be, and the master should allocate new nodes to areas of the ring that have a heavy load. The master should also be able to give directives to hosts, indicating that a node should be dropped (since it is in a cool area of the ring). Then the host should ask for a new id to replace this. Hosts will be configured to run a particular number of nodes. Should we have a HostToMaster protocol in addition to a NodeToMaster protocol, or should these be the same? Ranges might be <long,long> rather than <String,String>? I have to run out now, but will look at this more on Monday. Thanks again for getting this part started! Doug |
|
From: <ni...@us...> - 2008-03-14 21:46:27
|
Revision: 9
http://bailey.svn.sourceforge.net/bailey/?rev=9&view=rev
Author: ning_li
Date: 2008-03-14 14:46:30 -0700 (Fri, 14 Mar 2008)
Log Message:
-----------
Skeleton for the "distributed" system using threads per node. Does it look all right?
Added Paths:
-----------
trunk/src/java/org/apache/bailey/ddb/
trunk/src/java/org/apache/bailey/ddb/Client.java
trunk/src/java/org/apache/bailey/ddb/ClientMasterProtocol.java
trunk/src/java/org/apache/bailey/ddb/ClientNodeProtocol.java
trunk/src/java/org/apache/bailey/ddb/Logger.java
trunk/src/java/org/apache/bailey/ddb/Master.java
trunk/src/java/org/apache/bailey/ddb/Node.java
trunk/src/java/org/apache/bailey/ddb/NodeMasterProtocol.java
trunk/src/java/org/apache/bailey/ddb/NodeNodeProtocol.java
trunk/src/java/org/apache/bailey/ddb/Range.java
trunk/src/java/org/apache/bailey/ddb/Ring.java
trunk/src/java/org/apache/bailey/util/
trunk/src/java/org/apache/bailey/util/Pair.java
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <cu...@us...> - 2008-03-14 20:23:08
|
Revision: 8
http://bailey.svn.sourceforge.net/bailey/?rev=8&view=rev
Author: cutting
Date: 2008-03-14 13:23:14 -0700 (Fri, 14 Mar 2008)
Log Message:
-----------
Simplify search API, implement it for heap db & add a test case.
Modified Paths:
--------------
trunk/src/java/org/apache/bailey/Document.java
trunk/src/java/org/apache/bailey/Query.java
trunk/src/java/org/apache/bailey/Results.java
trunk/src/java/org/apache/bailey/heap/HeapDatabase.java
trunk/src/test/org/apache/bailey/Generator.java
trunk/src/test/org/apache/bailey/TestHeapDb.java
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <cu...@us...> - 2008-03-12 22:59:00
|
Revision: 7
http://bailey.svn.sourceforge.net/bailey/?rev=7&view=rev
Author: cutting
Date: 2008-03-12 15:59:02 -0700 (Wed, 12 Mar 2008)
Log Message:
-----------
Add first trivial test case.
Modified Paths:
--------------
trunk/build.xml
trunk/src/java/org/apache/bailey/Database.java
trunk/src/java/org/apache/bailey/heap/HeapDatabase.java
Added Paths:
-----------
trunk/LICENSE.txt
trunk/lib/junit-4.4.LICENSE.txt
trunk/lib/junit-4.4.jar
trunk/src/test/org/apache/bailey/TestHeapDb.java
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <cu...@us...> - 2008-03-12 22:30:11
|
Revision: 6
http://bailey.svn.sourceforge.net/bailey/?rev=6&view=rev
Author: cutting
Date: 2008-03-12 15:30:13 -0700 (Wed, 12 Mar 2008)
Log Message:
-----------
Add script that builds classpath, etc.
Added Paths:
-----------
trunk/bin/
trunk/bin/bailey
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <cu...@us...> - 2008-03-12 22:09:03
|
Revision: 5
http://bailey.svn.sourceforge.net/bailey/?rev=5&view=rev
Author: cutting
Date: 2008-03-12 15:09:08 -0700 (Wed, 12 Mar 2008)
Log Message:
-----------
Add a random query generator.
Modified Paths:
--------------
trunk/src/java/org/apache/bailey/Document.java
trunk/src/java/org/apache/bailey/Field.java
trunk/src/java/org/apache/bailey/Query.java
trunk/src/test/org/apache/bailey/Generator.java
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <cu...@us...> - 2008-03-12 21:15:36
|
Revision: 4
http://bailey.svn.sourceforge.net/bailey/?rev=4&view=rev
Author: cutting
Date: 2008-03-12 14:15:42 -0700 (Wed, 12 Mar 2008)
Log Message:
-----------
Add a random document generator.
Modified Paths:
--------------
trunk/src/java/org/apache/bailey/Document.java
Added Paths:
-----------
trunk/lib/slf4j-api-1.5.0.jar
trunk/lib/slf4j-simple-1.5.0.jar
trunk/lib/slf4j.LICENSE.txt
trunk/src/test/org/
trunk/src/test/org/apache/
trunk/src/test/org/apache/bailey/
trunk/src/test/org/apache/bailey/Generator.java
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: Ning L. <nin...@gm...> - 2008-03-12 18:31:07
|
On Tue, Mar 11, 2008 at 9:12 AM, Yonik Seeley <yo...@ap...> wrote: > More about what rebalancing means too... when rebalancing, can you > leave all the nodes in place (the replication configuration) and just > change what keys map to a node? We have two ways to achieve load balancing. I only explicitly described one of them in the design - partitioning. Partitioning partitions a ring into ranges, a.k.a nodes if we ignore replication for a moment. The other way is the mapping from nodes to hosts. A re-balance can be achieved by changing the partitioning (adding/removing a node) and/or by changing the mapping from nodes to hosts. On Wed, Mar 12, 2008 at 9:22 AM, Yonik Seeley <yo...@ap...> wrote: > On Tue, Mar 11, 2008 at 5:49 PM, Doug Cutting <cu...@ap...> wrote: > > > An example application can be an online email system. > > > The keys of a user's emails are prefixed by the user name, > > > so a user's emails are located together on the ring. When > > > a user searches his/her emails, the query is only sent to > > > servers which cover that range, instead of the entire ring. > > > > We could just define doc ids as 128-bit numbers rather than strings. > > Then user-provided hash values wouldn't be a special case. A > > constructor could convert string ids to 128-bit ids, and also store the > > original string in a field named "id". > > Allowing the user to specify a hash value doesn't seem so different > from allowing them to specify a numeric id... it's just 32 bits vs 128 > bits. Ning's use case doesn't seem to require collision-free hashes. No matter what we decide to use, the emphasis is that the documents are not uniformly distributed on the ring. Therefore, when we (re)partition, the goal is not that the partitioned ranges are about the same size, but that the documents on the partitioned ranges are about the same size. Ning |