darwinet / Discussion / General Discussion: MIV diffs

JX - 2013-03-22

This is my third attempt. Luckily after the browser blew away my first try I wrote the second one in a text document outside the browser.

The documents state that all MIV diffs must be reversible. Why?

This is easy to achieve for numerical values, but how do you make a string diff reversible? Binary files are generally considered un-diff-able. The only way to generate a reversible diff on binary data is to include both the previous and current versions. This is going to make the diffs huge and the version history huge.

I see no reason for reversible diffs, particularly if we are maintaining a full version history as per the document.

Secondly, why maintain a full version history? Obviously for some projects and some data types this is necessary for traceability, revision control, backup, etc. etc. However for the majority of projects and files it is simply not necessary. I believe we should flatten the version tree as often as possible and have come up with a very basic algorithm to do so:

Versions have two types: edit versions and merge versions. Merge versions are generated automatically where possible.

Versions have an identifier consisting of an ID generated from the node and user IDs that generated the version and a counter that is unique to the file and the node. In other words, two different nodes may use the same counter value for two different versions that they each generated. The counter value of 0 is unique for the creation version of a file. Two files created at the same time with the same name are considered as two different files.

Versions have a full identifier. For edit versions this consists of their identifier and that of the preceding version. For a merge version it consists of the version identifiers of the common ancestor, all versions included in the merge path, and the final merged version.

Nodes exchange a list of all versions that they possess. When a node generates a list containing a merge version, it does not include one of the versions that were merged in the list.

E.g. A domain with four nodes (A, B, C, D) has a file created by node A which will thus be version A:0.
The file is shared with nodes B and C that independantly edit it to create versions B:1, C:1, and C:2.
Node D subsequently connects to node B and gets versions A:0 and B:1.

At this point each nodes version list looks like:

A=A:0
B=A:0,A:0/B:1
C=A:0,A:0/C:1,C:1/C:2
D=A:0,A:0/B:1

Next node D gets versions C:1 and C:2 from node C (it already has version A:0, and to avoid conflicts doesn't pass version B:1 to node C). It generates version D:1 as a merge version and subsequently edits the file to generate version D:2.

At the same time node C generates version C:3 and sends all its versions to node A.

The version lists now look like:

A=A:0, A:0/C:1, C:1/C:2, C:2/C:3
B=A:0, A:0/B:1
C=A:0, A:0/C:1, C:1/C:2, C:2/C:3
D=A:0, A:0/B:1|C:2/D:1, D:1/D:2

If all nodes are subsequently on line together, when the node lists are exchanged we need some logic to manage who merges what:

*) If a node receives a conflict version it performs a merge but does not pass the other version in the conflict back. Instead it passes the merge version back (if the other node is still in communication).

*) If two nodes independantly merge the same combination of versions then the merge made by a node that created one of the merged versions takes precedence and other merged versions are discarded. Because of the rule above, only one node should create a merge version on a version it created. I hope.

*) If a node has done a merge it passes the merged version to the nodes that generated the source versions for the merge. Those nodes discard their versions in favour of the merged version. It also passes the merged version to all other nodes who perform a similar replace.

*) If a node knows that all other nodes possess a superceded node, they discard that node.

So now we get node D passing D:1 and D:2 to everyone. Node C and A both want to do a merge which will result in conflicting merge versions, and the version tree looks like:

A:0---->B:1---->D:1->D:2
\ /
->C:1->C:2->C:3

The version lists for each node look like:

A=A:0, A:0/B:1|C:2/D:1, D:1/D:2, C:2/C:3
B=A:0, A:0/B:1|C:2/D:1, D:1/D:2
C=A:0, A:0/B:1|C:2/D:1, D:1/D:2, C:2/C:3
D=A:0, A:0/B:1|C:2/D:1, D:1/D:2

At this point every node knows that every other node has all versions up to D:2 so they discard all older versions. Node A and node C perform the merge of D:2 and C:3 and the version lists look like:

A=D:1/D:2, C:2/C:3, A0:/D:2|C:3/A:1
B=D:1/D:2
C=D:1/D:2, C:2/C:3, A0:/D:2|C:3/C:4
D=D:1/D:2

The merge versions are shared with all other nodes, duplicate merges are discarded based on the precedence defined in the rules above,a nd common obsolete versions are discarded. The version lists become "A0:/D:2|C:3/C:4" for all nodes and they are all synched up.

This strategy should work quite well for text based documents, but will struggle for atomic data (integers) or binary files, due to the fact that it isn't possible to merge the data.

This strategy needs thorough testing. I suspect that, due to editing data offline, it will be possible to get the system into extraordinarily complex situations where the rules just don't cope. Even without flattening the version tree as often as possible, we are going to have problems with branches being generated and the data splintering into hundreds of versions that never get fully synchronised again.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kjell-Olov Högdahl - 2013-03-25

Hi Jontom!

Great that you have read about Delta MIV's. I will read you comments thoroughly I just don't have time right now.

A quick reply though: Deltas need to be reversible to save the MIV evolution history and make backtracking and "undo" possible. If a delta is not reversible it destroys the "past" and the evolution aspect is lost. There are environments where only "now" is of interest and branching should never occur. Online gaming is one of them. You just wan't to synchronize the current state of the game. But for file sharing, project managements, file version control systems, social interaction and what have you the evolution history is a very powerful feature. So my goal is to make this work and then just disable aspects of it when they are not needed!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

JX - 2013-04-03

I've been thinking about this and I don't see how you can implement MIV diffs the way you intend. So, how about a challenge. Implement some code for a string MIV value and a MIV diff for a string value, and I'll write some test code to see if I can get it to break.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kjell-Olov Högdahl - 2013-04-08

I'm on :)

The importance is not that my design works - but that there is a design that works.

See it this way - You need a way to share a common view of data in a P2P domain. It may be the state of an online poker game. Or it may be the state of a shared folder of files.

Either way the nodes need to be able to interact, sharing the data and also changing it!

So how do you do that? My proposal is that you place the data you share in a MIV and use the proposed delta MIV mechanism to exchange the current MIV state to all Peers of the Domain.

Now we may argue on how to do this. I hope at least you will see the core need of this mechanism. And that if we solve it, the enormous power such a mechanism possesses?

Some MIV notes:

It must work independent on Peer being online or offline. I.e. synchronization must happen asynchronously.

It must work also when the shared data is to large to store a copy at each peer.

It must work also when the domain is so large that each Peer just can't connect to them all.

Now

requires the Delta mechanism. Receivers must know how to apply the change correctly with other changes. They need to be synchronized at each peer.

Requires a roaming mechanism. Peers "pass on" changes to shared data using some algorithm that ensures good coverage and distribution speed.

Requires a RAID mechanism. Keeping track of where data is should it be required at a peer that is currently not having it in local storage.

Yes, it is complex! But I have been thinking on this for the last four years at least so I see it possible. And I see the potential in making it work. So bear with me. I may not have come up with the final design yet. But if I can get you to see the potential here we will get there!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

JX - 2013-04-08

Absolutely I see the benefit of MIV deltas. I see the immense power and potential. And I think they are a great idea.

I don't think it is possible to make them reversible as you insist must be the case. Or at least not in the way you describe. As an example, let us suppose we have 3 nodes, A, B, and C, and an integer value I.

Node A creates I with a value of 4.
Node B modifies I to a value of 5 (diff=+1).
Node C modifies I simultaneously to 6 (diff=+2).
Node A receives the two diffs and ends up with a value of 7.

After the merge, both 5 and 6 are valid values depending on whether you choose Node B's data or Node C's as the correct modification. 7 is, in this situation, not a correct value.

The only situation in which your diffs definition works is if the value is a counter and node B has added 1 item to whatever is being counted and node C has added 2.

I think the key here is that when the data is unmergeable, the only operation that makes sense is a set.

Suppose for example we have some data which is the name of the last person to modify some other data. The first person to modify the data is "Adam". Then "Bob" modifies the data. How do you define a reversible arithmetic diff for string data anyway?

Next "Charlie" modifies the data. When "Dave" receives the original value plus two diffs, both made on the same root data, how does the merge work? Even if you define the diff as "previous value/new value" you can only apply one diff to the data and throw the other away. If "Dave" receives Bob's update before Charlie's, while "Adam" receives them in the opposite order, then the two will end up with different final values.

Even with mergeable data you have a problem. Suppose you have an integer value with a previous value of 2 and a new value of 4. Was that a x2 or a +2? Suppose another node modifies 2 to 6 at the same time. Is that a x3 or a +4? When you merge the two diffs, is the final value 8 or 12?

MIV Diffs are fundamental to Darwinet as far as I can tell, so I think you need to finalise a workable design very very soon. We can build the basic peer to peer network with domains and users and so on. We can pass fixed data between the nodes. We can get stuff working. But anything involving shared modifiable data is dead in the water until you get the design right.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

MIV diffs

Forums

Help

MIV diffs

MIV diffs

Forums

Help

MIV diffs document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

MIV diffs