Thread: Re: [ivdfs-devel] Greetings
Status: Pre-Alpha
Brought to you by:
nkukard
|
From: Ian C. B. <ia...@bl...> - 2006-09-19 20:11:58
|
There appear to be no mail archives yet, has conversation yet begun? Incredibly Versatile? Perhaps we should nail down what "Versatile" features we are looking for. CornFS was an attempt at making a simple copy-on-write mirroring filesystem that could be easily recovered. The primary goal was archival use, with an eye toward rapid searchability, layering on top of any resilient networked filesystem. Is this targeted to be a POSIX filesystem (complete with permissions and/or ACLs) or just a heirarchical archival filesystem? While talking with others about their pet projects, the primary attributes of focus seem to be: Distributed metadata. - Distributing the filesystem namespace across N number of networked nodes. Distributed data. - Distributing the file data across N number of networked nodes (either full file copies or block striping). Distributed locking. - Changes to the metadata and/or data need to be synchronized to keep the filesystem coherent. Redundancy. - Having N copies available on the network, so that content is always available. Single instance store. - Use an MD5/SHA1 hash to uniquely identify and universally store a file (or blocks of a file) for later recall. Others have mentioned P2P networking instead of a private cluster model. While I'm interested in the above, I'd really like to see P2P get factored into this. With a little public key cryptography, it would be possible to join something theoretically like Freenet with something more like bittorrent: - Have "confederations" of metadata namespaces that link to one another using UUID or unique cryptographic hashes. - Have a single instance store of files or blocks using MD5/SHA1 hashes. - Have "webs of trust" for both metadata and file data so that only your friends can access your filesystems or file data, yet still allow objects to be replicated without knowing what is inside a file (ala Freenet). With webs of trust, one unified network would be possible, with storage availablity spanning _all_ nodes. "Dark" nets would not be necessary, as it would be part of the model. - Swarm file downloads from both "seed" and other "downloading" nodes (ala Bittorrent). If anonymity is a goal, possibly work something functionally equivalent to The Onion Router into the mix. I'm envisioning something that allows content to become a "permanent network copy", that will only go away if nobody is interested in it. Also, while FUSE is neat, FiST holds much promise. Something like RAIF is one of the alternatives to CornFS, and I've thought about rewriting CornFS to be a layed FiST filesystem. http://www.filesystems.org If the goal is to go toward P2P, implementing a loopback WebDAV interface might be the most universal way to present a filesystem (though it is non-POSIX compliant). I look forward to your thoughts. - Ian C. Blenke <ia...@bl...> http://ian.blenke.com/projects/cornfs/ |
|
From: Nigel K. <nk...@lb...> - 2006-09-20 09:42:41
Attachments:
signature.asc
|
> > There appear to be no mail archives yet, has conversation yet begun? Just begun now, I'm still getting everybody to join the list though. > =20 > Incredibly Versatile? > =20 > Perhaps we should nail down what "Versatile" features we are looking f= or. Agreed. > =20 > CornFS was an attempt at making a simple copy-on-write mirroring=20 > filesystem that could be easily recovered. The primary goal was archiv= al=20 > use, with an eye toward rapid searchability, layering on top of any=20 > resilient networked filesystem. > =20 > Is this targeted to be a POSIX filesystem (complete with permissions=20 > and/or ACLs) or just a heirarchical archival filesystem? Well, all the features of underlying filesystem will be available through FUSE... think passthrough. I want this to work as simply as possible, so one can even stop FUSE and still access the files. File operations like rename, unlink, create and on-close-writes are then journalled and sync'd between nodes. > =20 > While talking with others about their pet projects, the primary=20 > attributes of focus seem to be: > =20 > Distributed metadata. > - Distributing the filesystem namespace across N number of networked n= odes. > Distributed data. Agreed > - Distributing the file data across N number of networked nodes (eithe= r=20 > full file copies or block striping). Or even other forms of RAID-like behaviour? I think FiST (RAIF) has some other interesting RAID-like options, can always peek at the code and dump chunks into our own implementation. > Distributed locking. > - Changes to the metadata and/or data need to be synchronized to keep = > the filesystem coherent. Agreed > Redundancy. > - Having N copies available on the network, so that content is always = > available. Yea ... I was thinking about a GFS-like approach where all nodes can ping eachother and have a vote when one of them cannot be reached to outcast it from the group. Apon being outcast, the outcasted node could switch to read-only mode or just return IO errors, or even carry on and re-sync later.... *shrug*, all are pretty easy to implement. > Single instance store. > - Use an MD5/SHA1 hash to uniquely identify and universally store a fi= le=20 > (or blocks of a file) for later recall. The less we touch the files on the underlying filesystem the better, hashes can be stored in the FUSE's own .whatever directory if needed. I think our first goal should be plain and simple replication ... nothing special, if a file changes. copy the entire thing. If created, create and copy data. If deleted delete it. No locking. Lets get a very simple working FS together, then implement the features above with the most important first. > Others have mentioned P2P networking instead of a private cluster mode= l. > =20 > While I'm interested in the above, I'd really like to see P2P get=20 > factored into this. > =20 > With a little public key cryptography, it would be possible to join=20 > something theoretically like Freenet with something more like bittorre= nt: > - Have "confederations" of metadata namespaces that link to one anothe= r=20 > using UUID or unique cryptographic hashes. > - Have a single instance store of files or blocks using MD5/SHA1 hashe= s. > - Have "webs of trust" for both metadata and file data so that only yo= ur=20 > friends can access your filesystems or file data, yet still allow=20 > objects to be replicated without knowing what is inside a file (ala=20 > Freenet). With webs of trust, one unified network would be possible,=20 > with storage availablity spanning _all_ nodes. "Dark" nets would not b= e=20 > necessary, as it would be part of the model. > - Swarm file downloads from both "seed" and other "downloading" nodes = > (ala Bittorrent). I'm thinking p2p is more for another dedicated project, or maybe a branch of this one? My major goal is to provide 100% redundant file storage across datacenters and provide a true globally distributed filesystem. > I'm envisioning something that allows content to become a "permanent=20 > network copy", that will only go away if nobody is interested in it. > =20 > Also, while FUSE is neat, FiST holds much promise. Something like RAIF= =20 > is one of the alternatives to CornFS, and I've thought about rewriting= =20 > CornFS to be a layed FiST filesystem. > =20 > http://www.filesystems.org > =20 > If the goal is to go toward P2P, implementing a loopback WebDAV=20 > interface might be the most universal way to present a filesystem=20 > (though it is non-POSIX compliant). Not sure if FiST is included in the vanilla kernel? the less patching needed the better, one tends to loose ALOT of interest in a project the second you need to patch something. Especially if the server being patched is under a maintenance contract with a vendor, patching the kernel in this scenario is almost always guaranteed to void your contract= =2E > I look forward to your thoughts. The main requirement I have for this project is to replicate virtual hosting data between datacenters, clients websites and email (maildir). I need to switch between centers should a natural disaster or major outage occur. -Nigel |
|
From: Sean E. <ed...@no...> - 2006-09-20 20:36:08
|
Just a thought regarding the client/server as opposed to the P2P architecture, the way I see it is that doing a P2P system would be beneficial in the sense of collaborative editing. On the other hand, you could end up with a client/server architecture as well, through the use of dedicated servers, which just mirror the network filesystem as a whole. It would allow, say, 2 workstations and a backup server to be constantly synced together, whereas with a client-server structure you wouldn't have that advantage. Really the only difference would be that a workstation running IVDFS would be able to accept updates and update its own local copy, as well as sending updates to the "server(s)", and the "server(s)" would be able to function as workstations as well, and write to its own local system and update the other machines on the grid. And, more importantly, the filesystem /could/ still function as the proposed client/server layout. ~Sean |
|
From: Nigel K. <nk...@lb...> - 2006-09-20 20:52:02
Attachments:
signature.asc
|
Yo, Stupid subject line. > Just a thought regarding the client/server as opposed to the P2P > architecture, the way I see it is that doing a P2P system would be > beneficial in the sense of collaborative editing. On the other hand, y= ou > could end up with a client/server architecture as well, through the us= e of > dedicated servers, which just mirror the network filesystem as a whole= =2E It > would allow, say, 2 workstations and a backup server to be constantly = synced > together, I think the subject is getting a bit confused. P2P is peer-to-peer. Defined as each peer being equal in the duties it performs. This is what we want. It eliminates the single point of failure with a master-slave architecture. > whereas with a client-server structure you wouldn't have that > advantage. Really the only difference would be that a workstation runn= ing > IVDFS would be able to accept updates and update its own local copy, a= s well > as sending updates to the "server(s)", and the "server(s)" would be ab= le to > function as > workstations as well, and write to its own local system and update the = other > machines on the grid. And, more importantly, the filesystem /could/ sti= ll > function as the proposed client/server layout. This is the behavior were after. The point I tried to make is the goal for ivdfs is not anonymous file sharing, for instance bittorrent, gnutella ... etc. Or storing of anonymous data in using hashes ... etc. It could be designed into ivdfs, I have nothing against that, its just not one of the primary goals for the project. The primary goal is replication and raid-like behavior for high availability file storage across different networks, servers or whatever in the simplest manner possible, and to preserve the underlying filesystem in such a way that using plain replications its possible to stop ivdfs and still access the files on the local filesystem. So to address your comment above, one could mount a server fs on his/her workstation and have it setup in such a way that only files in ones own directory be replicated and the rest pulled directly off the server, or any other way in which you see fit :o) -Nigel |
|
From: Sean E. <ed...@no...> - 2006-09-20 21:28:32
|
Ok, that sounds good. I must have misunderstood. On Wednesday 20 September 2006 16:49, Nigel Kukard wrote: > Yo, > > Stupid subject line. > > > Just a thought regarding the client/server as opposed to the P2P > > architecture, the way I see it is that doing a P2P system would be > > beneficial in the sense of collaborative editing. On the other hand, you > > could end up with a client/server architecture as well, through the use > > of dedicated servers, which just mirror the network filesystem as a > > whole. It would allow, say, 2 workstations and a backup server to be > > constantly synced together, > > I think the subject is getting a bit confused. > > P2P is peer-to-peer. Defined as each peer being equal in the duties it > performs. This is what we want. It eliminates the single point of > failure with a master-slave architecture. > > > whereas with a client-server structure you wouldn't have that > > advantage. Really the only difference would be that a workstation > > running IVDFS would be able to accept updates and update its own local > > copy, as well as sending updates to the "server(s)", and the "server(s)" > > would be able to function as > > workstations as well, and write to its own local system and update the > > other machines on the grid. And, more importantly, the filesystem /could/ > > still function as the proposed client/server layout. > > This is the behavior were after. > > The point I tried to make is the goal for ivdfs is not anonymous file > sharing, for instance bittorrent, gnutella ... etc. Or storing of > anonymous data in using hashes ... etc. It could be designed into ivdfs, > I have nothing against that, its just not one of the primary goals for > the project. > > The primary goal is replication and raid-like behavior for high > availability file storage across different networks, servers or whatever > in the simplest manner possible, and to preserve the underlying > filesystem in such a way that using plain replications its possible to > stop ivdfs and still access the files on the local filesystem. > > So to address your comment above, one could mount a server fs on his/her > workstation and have it setup in such a way that only files in ones own > directory be replicated and the rest pulled directly off the server, or > any other way in which you see fit :o) > > > -Nigel |