Re: [Relfs-devel] Some thought's.... - News about relfs
Status: Pre-Alpha
Brought to you by:
applejack
|
From: Vincenzo C. <vin...@ya...> - 2005-01-06 08:47:22
|
On Tuesday 04 January 2005 22:41, Peter Schrammel wrote: > Hi > > I'm new to this list but I want to give my 2c for this gorgeous > project! Welcome! This list has been "a little" quiet since relfs was born, and I had fewer spare time to code (but I worked on relfs a lot anyway) due to my job getting full-time. I am going to start my ph.d now, so I guess I will have more free time, in the meantime there are news: === 1. OCaml port of RelFS === I faced hard problems with c++, related to memory management and passing of dynamically allocated memory between threads, and realized that I am not a C++ expert, and also that C++ is not the ideal language to quickly write prototypes implementing new ideas. In a word, I found that it would take more time for me to become good at coding in C++ than for any C++ coder to become good at coding in a simpler language like OCaml. I made up my mind and did the port to OCaml - the hardest part is to get a multithreaded (in the sense that one handles multiple fs callbacks at the same time) fuse binding for this language, there was one but was not designed for multithreading and was not up-to-date. I wrote another ocaml fuse binding, available at http://www.sourceforge.net/projects/ocamlfuse Even if it's up-to-date and designed for multithreading, it's not stable enough on the latter: after some milion of requests it crashes for still unknown reasons - while in single-threaded mode it works well. This is not a serious problem at this time (requests usually interleave well with each other) but a production system can't afford to block while waiting for the cdrom tray to close - so I will have to make multithreading work better in the future - I have several alternatives and am sure that I will find my way, but I can keep on working on relfs in the meantime. Also, I will have to improve speed (by now filesystem data is copied twice and this makes it slow - about 10Mb per second on my centrino laptop). I made a port of the relfs core, and I'm satisfied: the source is now about 300 lines of code, which is exactly its value :) Moreover, apart from functional programming and its very good type system, ocaml has superb multithreading primitives and libraries and I hope this will pay off during development. Said this, I did not commit any changes to CVS, since I had no time to write installation instructions (postgresql-ocaml is needed), however I think I am going to make a branch next week to allow everybody to see it. === 2. RelFS design, and goals for the first release === Here we come to your e-mail :) In the last months I also redesigned the DB schema which looks a little like you propose, I will put it in the CVS next week. > > First I want to structure my thought's about FS, Files and > Databases.... > 1. > Filesystems hold Objects with some properies: 2. > Files hold Objects with some properties: 3. > Databases hold Objects with some properties. Ok for 1. and 3., while for 2. I am still unsure on how to generalize it - not properties which could be seen as extended attributes (fuse supports those) but a membership relation: does it make sense to represent a mbox file as a directory of text messages? Of course, but what should each message look like? The text of the message without headers (because they can be shown as extended attributes of the file) or the plain text of the message? And how does one deal with message attachments? Each message should be a directory, but it could be unconvenient from the user point of view. > > ---- > > 1. It would be nice to have the Objects of the FS/File in a DB. This will be the first feature to be implemented, using index plugins. > 2. It would be nice to have Objects of a DB queryable in the FS/File This will not be completely done in the first release, because it requires a complex storage architecture: for each file or directory, it should be decided if its contents are provided by a raw file on the underlying filesystem, or by a plugin which performs a db query, or parses and reassembles a file on the fly and so on, and this "storage plugins" architecture is completely orthogonal to an "index plugins" architecture: - Index plugins can't fail, they can be run in batch queues and there can be more than one index plugin for any file. - Storage plugins can fail, are used interactively and if more than one plugin has to provide contents for a file, they should be stacked on top of each other - e.g. a plugin which turns email messages into directory, stacked on top of a plugin which provides email messages reading from an ftp site or a pop3 server. Even if storage plugins will not be in the first relase, I absolutely want directories representing queries on the db, like "my punk/rock mp3s". Those will be implemented ad-hoc and then replaced by a storage plugin. > 3. It would be nice to modify the DB in the FS/File There are various implementations of this, and we can surely get one (if you refer to seeing db objects, like procedures, as files in the filesystem) > 4. It would be nice to modify the FS/File in the DB > This will also be the purpose of index plugins but not in the first release, where we won't do this bidirectional communication (I think it can be done extending the SQL server and using triggers). > The first two aren't that hard I think but you should keep one thing > in mind: a file is not it's filename it's an abstract concept. I > would give files a UUID. A UUID can be given a name like > "/archive/coolsong.mp3" or another name > "/archive/genre/rock/coolsong.mp3" (some call this a link). > Yes, I have now a table which holds pairs object ids and names, which are not _paths_ but only the result of the "basename" function. There is also a "membership" relationship between objects which gives rise to paths. This way there is no logical difference between a file contained into a zip archive and a file contained into a directory. However I will have to find a convenient syntax to allow the coexistence of a zip file seen as a file and as a directory, in the same parent directory, something like "file.zip" and "file.zip#" to represent the directory, but where "#" is a character which is not allowed in an unix filename. Unfortunately it seems that the only not allowed character is "/" which would not work in many applications. What are UUIDs? Is this a standard of some sort? Are UUIDs a function of file data? I am using just integers right now. > The second problem is that you have to access the properties with the > filname. Be realistic nobody will use special tools to query your FS. > I know :) > A good aproach would be waht the guys from reiserFSv4 did > (http://www.namespace.com): > every file has it's properties accessed if it was a directory and the > > property a filename in the directory: > :cat "/archive/coolsong.mp3/genre" > :rock > I don't like this because a shell or an userspace program could perform a "dirname" operation on the file name to get its path, find that the path is not a directory and complain to the user - I would prefer a different character than "/" e.g. /archive/coolsong.mp3#genre, but am still unsure. > Question are there any properties in the filesystem that > are attached to the filename? Yes! A comment on the filename > "/etc/shadows" is not a comment to it's content. This is _exactly_ the motivating example to assume that there will be properties attached to paths vs properties attached to files (objects) - it seems that we are going to use relfs for the same reasons :) > So usually a filename indicates a file as above. But you can do > something like: > echo "This file is a Security risk" > /etc/shadow/filename/comment > you could even do comment's on comment ;-) > > > What about directories? It's the same but it means some loss to the > filenamespace ( a special filename e.g. .rfs indicates the > properties) This is another trouble with using "/" as the final separator. > > :echo "my brother's files! don't remove" > /archive/.rfs/comment > > So most of the 4 quest are solved: > > 1. putting the Objects of the FS/Files into a DB (I think you call it > proxy) I thought of FAM (File alternation monitor) and udev: Just > tell other programs, that something has changed and they'll do the > rest. So your FS would just send a message e.g. on the DBUS that a > file has been altered/ created. Daemons (even user-daemons) could > listen on the dbus and do some caching of the information (like > extracting mp3 tags, do checksums...). Don't force a specific DB > schema on them...they'll hack around it. In fact I am going to leave the db schema unspecified, just like the filesystem hierarchy in a linux distribution; I plan to use more than one communication protocol for indexing applications, e.g. dbus but also xmlrpc or just dynamic loading and linking of shared libraries, or a command line (a la CGI) interface for simple shell scripts. > Here a simple aproach for a DB modell: > > FS Objects: > UUID char(33) > The fact that you write "char(33)" makes me think that there is a written specification. There is this problem with the identity of objects, that it's going to be lost if you copy the file "outside" the filesystem, so you loose any information attached to this uuid, unless it's like an md5sum, but then it's going to change when the file is modified, and so we would need cascade update of primary keys everywhere. There is also the problem of open-endness: how do I join two different filesystems on different machines if object ids can collide? > FS Attribnames: > ID integer e.g. 9088 > value integer e.g. 78 (specific to the plugin) > plugin char(16) e.g. mp3tag > name char(16) e.g. artist Hmmm, even if I realize that it could be useful to reify attribute names, I would prefer to keep a rich relational structure, like an "mp3" table with author, size etc as columns, where e.g. author is a foreign key into another table. I know there are many problems, but it would allow us to exploit the full power of a relational database in user applications. > Sorry my C++ is not the best but with this aproach I could even use > haskell as filters FS->DB ;-). > You see, I switched from C++ to ocaml because mine was not the best too :) However, language independence for index plugins is an important requirement for relfs - I would like to be able to even use shell scripts taking the modified file as argument, and outputting an sql script. V. |