Re: [Groupkit-users] Groupkit research
Brought to you by:
cthatcher,
markroseman
|
From: Colin M. <co...@ch...> - 2003-04-21 06:22:29
|
On Wed, 2003-04-09 at 19:35, TongMei wrote: > Hi all, > > Sorry I have been underground for the last month. Had a lot on and have > been totally occupied reading through two p2p books and a few other > books related to open source, databases, distribution etc. Yep, that's cool. I'm similarly intermittant. Someone might commit the sf changes to fix the showstopper bug in the CVS version, though, otherwise people might get the wrong idea re groupkit. > Answering some of the questions raised by Colin McCormack: > > Both encryption and compression will eventually become important > concepts to Groupkit, both for very different reasons and I hope that > together we can implement these simple but powerful layers. The point > raised about the native implementation of sockets (as opposed to using > comm) may also have sway here The more I look at comm, the less enamoured of it I become - it has some spooky almost-OO approaches to creating new channels. > and I have looked at the source code for > simple compression - from the tcl zlib implementation. The packages Trf > and Tls would provide a very strong solution for encryption but we would > then be dependant on those packages. Only if the user wanted to have encryption - I like the idea of using existing modules wherever possible. Mind you, the certificate overhead of SSL is great - the operational overhead of creating the certificates, I mean. Additionally it creates a dependance on the SSLeay package, which AFAIK is not available for Windows (if that matters.) I wrote a Q&D RSA package over my libgmp multi-precision int interface, which when coupled with Trf would give strong encryption and authentication. Of course, it only moves the dependency problem to a lower level, since libgmp isn't easily available for Windows either (if it matters.) > None-the-less I am still in the > process of looking at a great deal of other things and have yet to come > to terms with the socket code in gk. The socket code is funky in a nice way - quite a powerful mechanism. > Metakit databases and persistence are the things I am currently looking > into and would be glad of any help and ideas. I wrote and use a metakit automagic persistence class for itcl, and (coincidentally) use it to store trees (in the tcllib sense of ::struct::tree) I found that there's a mismatch between how ::struct::tree stores, and the preferred storage model for metakit. I guess, since environments (which are tree-structured) are the fundamental structure in groupkit, the question of optimal storage for persistence needs addressing. I guess one'd serialise a gk environment by trying to store it in a single table, with the path expression as the key. It might be efficient, in this case, to store the Mk record number explicitly in the tree, since I can't imagine searching for full path expressions would be very efficient. Additionally, in the vein of looking for code to utilise, and given that trees are special cases of digraphs - is it worth looking at Jacob Levy's e4graph for the fundamental storage mechanism? It's C++, it provides persistence: http://sourceforge.net/projects/e4graph/ It's been a couple of weeks since I looked at the environment implementation, but I recall wondering a couple of things: (a) does the functionality of environment warrant a C implementation (I'm not sure, it well might), (b) does the C implementation of environment warrant a Tcl_Obj wrapper? (I'm not sure of the real advantages, but am happy to run one up) > I had implemented a shody > environment with Mk connectivity but I was not happy with the model and > could already see terrible issues arrising with regards to uniqueness, > freshness and collision. Yes. Mk may not be a good storage model, since it loses the virtues of tree structure inherent in environment - for example one could imagine fine-grained branch-level locking for update, rather than table-level. > So I stopped and decided to read the material > mentioned above. I'm good mates with Matthew Toseland, who's doing some work on one of the P2P things - I'm sure I could ask him to consult if this would help. > I now have a clearer idea of how this may be > implemented but it really does all boil down to one thing: What is our > motivation for db connectivity in Gk? > Is it that we just want the > ability to load data from previous sessions for persistence, or do we > want full blown database support in this environment. The answer to > that question will have significant impact on how we proceed. It troubles me, philosophically, that everybody (myself included) seems to fall into the trap of thinking that a database has to be table-structured. I came originally from a network db background, well before RDB sank its fangs into the field and made every db look like a glorified spreadsheet. If we had simple persistence on a tree structure (ie: environment,) then we have a lot of what a database gives. If we added a per-node colouring (per-node data storage, like arrays in tcl - if we don't already have it, I'm too lazy to look right now) and if we could do things like efficiently traverse a node's children, and a branch, looking for matches by node content, we could treat each node as a record, do very complex searches, and have an interesting distributed multi-access database in gk's native format. Next Q: what if environments were generalised to graphs? Would that work, conceptually? It'd be handy for modelling the comms networks over which the gk runs, for one :) I'm sorry to raise more Qs than I answer, but they're mainly thinking-about questions. > Network issues have always been a problem when it comes to p2p systems. > However, in my research I have come across the various solutions > implemented by the likes of Gnutella and there are some elegant ones > and some ugly ones. I would like addressing to be done by IP > address/port number and not have to rely on host names. Also a > push/pull protocol implemented in the registrar could get around a great > deal of firewall issues. > A repeater type registrar that sits on the firewall would only be a > solution for a few as many IT depts. would simply refuse to run such a > thing. I strongly feel we need to look elsewhere for a solution to the > massive problem surrounding firewalls. > > > All the best, > > Chad Thatcher -- Colin McCormack <co...@ch...> |