Thread: [Relfs-devel] Some thought's....
Status: Pre-Alpha
Brought to you by:
applejack
|
From: Peter S. <pet...@gm...> - 2005-01-04 21:39:47
|
Hi I'm new to this list but I want to give my 2c for this gorgeous project! First I want to structure my thought's about FS, Files and Databases.... Filesystems hold Objects with some properies: filename mtime atime user group mode Objects (Files) ... Files hold Objects with some properties: author audio-stream video-stream ... Databases hold Objects with some properties. ---- 1. It would be nice to have the Objects of the FS/File in a DB. 2. It would be nice to have Objects of a DB queryable in the FS/File 3. It would be nice to modify the DB in the FS/File 4. It would be nice to modify the FS/File in the DB The first two aren't that hard I think but you should keep one thing in mind: a file is not it's filename it's an abstract concept. I would give files a UUID. A UUID can be given a name like "/archive/coolsong.mp3" or another name "/archive/genre/rock/coolsong.mp3" (some call this a link). The problem is that in an FS you always start with the filename and you'd have to specify if you mean the file or the filename. The second problem is that you have to access the properties with the filname. Be realistic nobody will use special tools to query your FS. A good aproach would be waht the guys from reiserFSv4 did (http://www.namespace.com): every file has it's properties accessed if it was a directory and the property a filename in the directory: :cat "/archive/coolsong.mp3/genre" :rock but again: we should not attach this property to the filename but to the file. Question are there any properties in the filesystem that are attached to the filename? Yes! A comment on the filename "/etc/shadows" is not a comment to it's content. So usually a filename indicates a file as above. But you can do something like: echo "This file is a Security risk" > /etc/shadow/filename/comment you could even do comment's on comment ;-) What about directories? It's the same but it means some loss to the filenamespace ( a special filename e.g. .rfs indicates the properties) :echo "my brother's files! don't remove" > /archive/.rfs/comment So most of the 4 quest are solved: 1. putting the Objects of the FS/Files into a DB (I think you call it proxy) I thought of FAM (File alternation monitor) and udev: Just tell other programs, that something has changed and they'll do the rest. So your FS would just send a message e.g. on the DBUS that a file has been altered/ created. Daemons (even user-daemons) could listen on the dbus and do some caching of the information (like extracting mp3 tags, do checksums...). Don't force a specific DB schema on them...they'll hack around it. 2. querying the DB in the FS see above.... 3. modify the DB in the FS/File easy see above... 4. modify the FS/File in the DB hey that's tricky ... ----- ----- Here a simple aproach for a DB modell: FS Objects: UUID char(33) FS Attribnames: ID integer e.g. 9088 value integer e.g. 78 (specific to the plugin) plugin char(16) e.g. mp3tag name char(16) e.g. artist 1 1 system filename 2 2 system uid 3 3 system gid 4 4 system size_in_bytes 5 5 system mode 6 1 chsum md5 7 2 chsum sha1 8 1 mp3tag artist 9 2 mp3tag genre 10 3 mp3tag song FS Attribs UUID char(33) ATTRIB_ID int4 value text for speeds sake you could do some caching tables for the system plugin. ----------- Another aproach would be something from graph theory: Objects: UUID char(33) value text Links: UUID_from char(33) UUID_to char(33) creating a file means to create some objects: 1. the file itself value as the content 2. an object for the name 3. a link from object to name 4. a link from the name to an object that has the value "filename" ... you see this wouldn't be that fast but very clear and mighty. I'm very interested in the 1. point of this aproach: Getting informed if files are changed and do some proxying. Sorry my C++ is not the best but with this aproach I could even use haskell as filters FS->DB ;-). Peter |
|
From: Wolfgang I. <wol...@gm...> - 2005-01-05 14:41:50
|
Peter Schrammel wrote: > oach would be waht the guys from reiserFSv4 did > (http://www.namespace.com): > every file has it's properties accessed if it was a directory and the > property a filename in the directory: > :cat "/archive/coolsong.mp3/genre" > :rock > > but again: we should not attach this property to the filename but to > the file. Question are there any properties in the filesystem that are > attached to the filename? Yes! A comment on the filename > "/etc/shadows" is not a comment to it's content. > So usually a filename indicates a file as above. But you can do > something like: > echo "This file is a Security risk" > /etc/shadow/filename/comment > you could even do comment's on comment ;-) > Hi, I'm also new to the project, and i'd like do drop a few comments on querying the filesystem/database. The primary advantage of a relational database is that it's contents dont' have a single hierarchy (e.g. a directory tree). If you want to put something in a traditional filesystem, you split your content in different categories, eg. photos: First category: Year of capture. 2001 2002 2003 2004 ... Then again, you want your photos sorted by some other category, too. for example photos taken indoor and outdoor. This forces you to create a ton of directories 2001/indoor 2001/outdoor 2002/indoor 2002/outdoor .. or you can also try this approach: indoor/2001 indor/2002 indoor/2003 outdoor/2001 In either case it will suck, because if you add another category and you have to search for pictures of a certain category, you'll have _many_ directories to go through. When I subscribed to the list, I had the following "query language" in mind: i name the different categorisation schemes, eg. there's a property 'year' that can be a value from 2001 to 2005, 'location' can be, say 'indoor', 'outdoor' and 'underwater'. If I now want to 'query' my relational file system on underwater photos from 2002, i put: cd /relfs/fotos/year=2002/category=underwater ls or cd /relfs/fotos/category=underwater/year<2002/year>2000 ls ^---- I'm not bound to a fixed directory hierarchy (at least not starting from the relfs-mountpoint ;) I've been subscribed for a few weeks to this list, and I didn't hear very much from it.. How many people are we actually? How's relfs development doing? Greetings Wolfgang |
|
From: Peter S. <pet...@gm...> - 2005-01-05 22:04:09
|
I see your point. But I think we have 3 things here: 1. create an index of meta-data without storing the object itself (no storage level implementation) This could be done by a file alteration monitor filesystem (I outlined this in my last mail) 2. Do queries based on the meta-data using a fs as an interface to the db (still no storage level implementation) 3. save the object in the db (storage level implementation) a) read only b) read/write 1: simple I think... 2: As far as I know postgres can do evaluated queries: So e.g. it would by cool if you'd do something like: echo "select filename+ooid+ending from objectlist where year='1993' and mime-type like 'image%'" > /year1993_photos/.relfs/readdir dir /thisdir my_my_on_my_boat.jpg-897987-987879-879879.jpg birtday.jpg-90809809-098089-098098.jpg The query could be stored in the DB (your girlfriend would appriciate that) Of course you could do: ls "/.global_relfs/select uuid from objectlist where year='1993' and mime-type like 'image%'" The .global_relfs gives you full access on all storage objects, but has no state (doesn't remember your queries) a list on /.global_relfs wouldn't give you back anything. but opening a file /.global_relfs/<uuid> get's you to the storage object. I wouldn't query for filenames here because they aren't unique (unless you store them with full pathnames, but that would be a horror ....) or you replace the '/' from the original files by e.g. '-' So you could store/alter the query for a given directory at runtime with system tools. Could use add-hoc queries The hardest thing is the database model. Is has to be very flexible which usually means graphs. But graph queries have a bad performance on relational DB (though for me this is not the point: it's better to have the data I want in 1s then wrong data in 0.01s 1000 times. 3. (That's tricky...) Good night Peter |
|
From: Vincenzo C. <vin...@ya...> - 2005-01-06 09:04:22
|
On Wednesday 05 January 2005 23:05, Peter Schrammel wrote: > 1. create an index of meta-data without storing the object itself (no > storage level implementation) > =A0 This could be done by a file alteration monitor filesystem (I > outlined this in my last mail) The purpose of relfs is to store objects on the underlying filesystem=20 (so that FAM is not necessary) and metadata on the db, but I am sure=20 that there will also be "external objects" e.g. all the objects that=20 have been seen by a web proxy connected to relfs, or all the objects=20 that have been backed up on a cdrom, which will have their metadata=20 held in the db. This means that we should store hashes for all files -=20 or some other "unique" identification method. Do hashing algorithms=20 operate on a per-block basis, so that modifications can be done=20 efficiently? V. |
|
From: Peter S. <pet...@gm...> - 2005-01-06 10:18:45
|
Vincenzo Ciancia schrieb: > On Wednesday 05 January 2005 23:05, Peter Schrammel wrote: > >>1. create an index of meta-data without storing the object itself (no >>storage level implementation) >> This could be done by a file alteration monitor filesystem (I >>outlined this in my last mail) > But the indexing needn't be done in the Filesstem: Your filesystem just has to inform an indexing system that something was done: 1. The "watcher" filesystem discovers that a file has been moved from a to b 2. It sends a message on the dbus wht's been done. 3. The watcher FS has done it's job. -- 1. The dbus daemon like "beagled" listens on the dbus and start's indexers according to the informations from the dbus 2. The indexers can be written in any language and are just files in the filesystem (look at udev). 3. the job for the indexer daemon is done -- 1. the relfs is the retrieval fs. it's job is to give me back links to the files the watcher fs watches. So it's job is just to store my queries and to execute them. I'd let the storage of objects out of scope (at least until v1.0 is out) > The purpose of relfs is to store objects on the underlying filesystem > (so that FAM is not necessary) and metadata on the db, but I am sure > that there will also be "external objects" e.g. all the objects that > have been seen by a web proxy connected to relfs, or all the objects > that have been backed up on a cdrom, which will have their metadata > held in the db. This means that we should store hashes for all files - > or some other "unique" identification method. Do hashing algorithms > operate on a per-block basis, so that modifications can be done > efficiently? I don't think that this works: You need to have something unique stored in the file (but this you can't grant). Any system with hashes is just a good guess but nothing more. UUID (Universal unique ids -- try uuid-gen or have a look at the labels of the filesystems (man mount)) could help here if you could attach them to your files (somehow). I'd say that files outside the watched filesystem should be out of scope for a while. Peter |
|
From: Vincenzo C. <vin...@ya...> - 2005-01-06 08:57:40
|
On Wednesday 05 January 2005 15:40, Wolfgang Illmeyer wrote: > I've been subscribed for a few weeks to this list, and I didn't hear > very much from it.. How many people are we actually? How's relfs > development doing? There are twelve people subscribed, and relfs development is done by me in my spare time, which has been few in the last two months and will be more in the next year. Discussion has not been active here, but there are open questions (some of which are being answered in this discussion) in the list archives. I use every quarter of hour of free time to design and code, and switched to ocaml as explained in my other e-mail because it's faster to code correctly in that language (at least for me), so that I can use this free time in the best possible way. The true timewasters now are design decisions like the one that you and Peter are discussing so your thread is actually being of help :) There is this problem related to hierarchies: it is possible to allow directories which represent flat queries, as "select * from obj where mimetype=audio/mp3 and author like '%queen%'", but how does one say "show me all my mp3s, wherever they are on the disk, grouped in directories, first by author and then by genre"? I am not sure on how a hierarchy could be expressed as the result of an sql query (of course we can somewhat extend the query language). V. |
|
From: Vincenzo C. <vin...@ya...> - 2005-01-06 08:47:22
|
On Tuesday 04 January 2005 22:41, Peter Schrammel wrote: > Hi > > I'm new to this list but I want to give my 2c for this gorgeous > project! Welcome! This list has been "a little" quiet since relfs was born, and I had fewer spare time to code (but I worked on relfs a lot anyway) due to my job getting full-time. I am going to start my ph.d now, so I guess I will have more free time, in the meantime there are news: === 1. OCaml port of RelFS === I faced hard problems with c++, related to memory management and passing of dynamically allocated memory between threads, and realized that I am not a C++ expert, and also that C++ is not the ideal language to quickly write prototypes implementing new ideas. In a word, I found that it would take more time for me to become good at coding in C++ than for any C++ coder to become good at coding in a simpler language like OCaml. I made up my mind and did the port to OCaml - the hardest part is to get a multithreaded (in the sense that one handles multiple fs callbacks at the same time) fuse binding for this language, there was one but was not designed for multithreading and was not up-to-date. I wrote another ocaml fuse binding, available at http://www.sourceforge.net/projects/ocamlfuse Even if it's up-to-date and designed for multithreading, it's not stable enough on the latter: after some milion of requests it crashes for still unknown reasons - while in single-threaded mode it works well. This is not a serious problem at this time (requests usually interleave well with each other) but a production system can't afford to block while waiting for the cdrom tray to close - so I will have to make multithreading work better in the future - I have several alternatives and am sure that I will find my way, but I can keep on working on relfs in the meantime. Also, I will have to improve speed (by now filesystem data is copied twice and this makes it slow - about 10Mb per second on my centrino laptop). I made a port of the relfs core, and I'm satisfied: the source is now about 300 lines of code, which is exactly its value :) Moreover, apart from functional programming and its very good type system, ocaml has superb multithreading primitives and libraries and I hope this will pay off during development. Said this, I did not commit any changes to CVS, since I had no time to write installation instructions (postgresql-ocaml is needed), however I think I am going to make a branch next week to allow everybody to see it. === 2. RelFS design, and goals for the first release === Here we come to your e-mail :) In the last months I also redesigned the DB schema which looks a little like you propose, I will put it in the CVS next week. > > First I want to structure my thought's about FS, Files and > Databases.... > 1. > Filesystems hold Objects with some properies: 2. > Files hold Objects with some properties: 3. > Databases hold Objects with some properties. Ok for 1. and 3., while for 2. I am still unsure on how to generalize it - not properties which could be seen as extended attributes (fuse supports those) but a membership relation: does it make sense to represent a mbox file as a directory of text messages? Of course, but what should each message look like? The text of the message without headers (because they can be shown as extended attributes of the file) or the plain text of the message? And how does one deal with message attachments? Each message should be a directory, but it could be unconvenient from the user point of view. > > ---- > > 1. It would be nice to have the Objects of the FS/File in a DB. This will be the first feature to be implemented, using index plugins. > 2. It would be nice to have Objects of a DB queryable in the FS/File This will not be completely done in the first release, because it requires a complex storage architecture: for each file or directory, it should be decided if its contents are provided by a raw file on the underlying filesystem, or by a plugin which performs a db query, or parses and reassembles a file on the fly and so on, and this "storage plugins" architecture is completely orthogonal to an "index plugins" architecture: - Index plugins can't fail, they can be run in batch queues and there can be more than one index plugin for any file. - Storage plugins can fail, are used interactively and if more than one plugin has to provide contents for a file, they should be stacked on top of each other - e.g. a plugin which turns email messages into directory, stacked on top of a plugin which provides email messages reading from an ftp site or a pop3 server. Even if storage plugins will not be in the first relase, I absolutely want directories representing queries on the db, like "my punk/rock mp3s". Those will be implemented ad-hoc and then replaced by a storage plugin. > 3. It would be nice to modify the DB in the FS/File There are various implementations of this, and we can surely get one (if you refer to seeing db objects, like procedures, as files in the filesystem) > 4. It would be nice to modify the FS/File in the DB > This will also be the purpose of index plugins but not in the first release, where we won't do this bidirectional communication (I think it can be done extending the SQL server and using triggers). > The first two aren't that hard I think but you should keep one thing > in mind: a file is not it's filename it's an abstract concept. I > would give files a UUID. A UUID can be given a name like > "/archive/coolsong.mp3" or another name > "/archive/genre/rock/coolsong.mp3" (some call this a link). > Yes, I have now a table which holds pairs object ids and names, which are not _paths_ but only the result of the "basename" function. There is also a "membership" relationship between objects which gives rise to paths. This way there is no logical difference between a file contained into a zip archive and a file contained into a directory. However I will have to find a convenient syntax to allow the coexistence of a zip file seen as a file and as a directory, in the same parent directory, something like "file.zip" and "file.zip#" to represent the directory, but where "#" is a character which is not allowed in an unix filename. Unfortunately it seems that the only not allowed character is "/" which would not work in many applications. What are UUIDs? Is this a standard of some sort? Are UUIDs a function of file data? I am using just integers right now. > The second problem is that you have to access the properties with the > filname. Be realistic nobody will use special tools to query your FS. > I know :) > A good aproach would be waht the guys from reiserFSv4 did > (http://www.namespace.com): > every file has it's properties accessed if it was a directory and the > > property a filename in the directory: > :cat "/archive/coolsong.mp3/genre" > :rock > I don't like this because a shell or an userspace program could perform a "dirname" operation on the file name to get its path, find that the path is not a directory and complain to the user - I would prefer a different character than "/" e.g. /archive/coolsong.mp3#genre, but am still unsure. > Question are there any properties in the filesystem that > are attached to the filename? Yes! A comment on the filename > "/etc/shadows" is not a comment to it's content. This is _exactly_ the motivating example to assume that there will be properties attached to paths vs properties attached to files (objects) - it seems that we are going to use relfs for the same reasons :) > So usually a filename indicates a file as above. But you can do > something like: > echo "This file is a Security risk" > /etc/shadow/filename/comment > you could even do comment's on comment ;-) > > > What about directories? It's the same but it means some loss to the > filenamespace ( a special filename e.g. .rfs indicates the > properties) This is another trouble with using "/" as the final separator. > > :echo "my brother's files! don't remove" > /archive/.rfs/comment > > So most of the 4 quest are solved: > > 1. putting the Objects of the FS/Files into a DB (I think you call it > proxy) I thought of FAM (File alternation monitor) and udev: Just > tell other programs, that something has changed and they'll do the > rest. So your FS would just send a message e.g. on the DBUS that a > file has been altered/ created. Daemons (even user-daemons) could > listen on the dbus and do some caching of the information (like > extracting mp3 tags, do checksums...). Don't force a specific DB > schema on them...they'll hack around it. In fact I am going to leave the db schema unspecified, just like the filesystem hierarchy in a linux distribution; I plan to use more than one communication protocol for indexing applications, e.g. dbus but also xmlrpc or just dynamic loading and linking of shared libraries, or a command line (a la CGI) interface for simple shell scripts. > Here a simple aproach for a DB modell: > > FS Objects: > UUID char(33) > The fact that you write "char(33)" makes me think that there is a written specification. There is this problem with the identity of objects, that it's going to be lost if you copy the file "outside" the filesystem, so you loose any information attached to this uuid, unless it's like an md5sum, but then it's going to change when the file is modified, and so we would need cascade update of primary keys everywhere. There is also the problem of open-endness: how do I join two different filesystems on different machines if object ids can collide? > FS Attribnames: > ID integer e.g. 9088 > value integer e.g. 78 (specific to the plugin) > plugin char(16) e.g. mp3tag > name char(16) e.g. artist Hmmm, even if I realize that it could be useful to reify attribute names, I would prefer to keep a rich relational structure, like an "mp3" table with author, size etc as columns, where e.g. author is a foreign key into another table. I know there are many problems, but it would allow us to exploit the full power of a relational database in user applications. > Sorry my C++ is not the best but with this aproach I could even use > haskell as filters FS->DB ;-). > You see, I switched from C++ to ocaml because mine was not the best too :) However, language independence for index plugins is an important requirement for relfs - I would like to be able to even use shell scripts taking the modified file as argument, and outputting an sql script. V. |
|
From: Peter S. <pet...@gm...> - 2005-01-06 11:00:44
|
Vincenzo Ciancia schrieb:
> On Tuesday 04 January 2005 22:41, Peter Schrammel wrote:
>
> Welcome! This list has been "a little" quiet since relfs was born, and I
> had fewer spare time to code (but I worked on relfs a lot anyway) due
> to my job getting full-time.
>
> I am going to start my ph.d now, so I guess I will have more free time,
> in the meantime there are news:
good look (I guess you won't need it)
...
Ocaml is cool (I used it 4 years ago for a xml parser (my phD)) though I
prefer ruby now (just a question of taste I think).
> === 2. RelFS design, and goals for the first release ===
>
>
>
> Ok for 1. and 3., while for 2. I am still unsure on how to generalize it
> - not properties which could be seen as extended attributes (fuse
> supports those) but a membership relation: does it make sense to
> represent a mbox file as a directory of text messages? Of course, but
> what should each message look like? The text of the message without
> headers (because they can be shown as extended attributes of the file)
> or the plain text of the message? And how does one deal with message
> attachments? Each message should be a directory, but it could be
> unconvenient from the user point of view.
Ok let's talk about the storage layer:
I think you'll have two directions:
1. composition of file contents to one file
You have a directory of files and want to have this directory as a tgz.
E.g you have a directory of "My PhD " whit text/grafics... in it and you
want to offer that zipped on your webside (always the newset version of
course).
2. decomposition of a file
The other way round: You have one file but want to alter the contents
without decomposing the file:
E.g. xml files (very nice have ordered trees)
tar files (though you could untar them and do #1)
mp3 files
/etc/passwd /etc/group /etc/shadow
Jo. #1 is easy I think
#2 is harder
>
>>1. It would be nice to have the Objects of the FS/File in a DB.
>
>
> This will be the first feature to be implemented, using index plugins.
Indexpluggins are nice but I wouldn't do indexing within the FS (see my
other mail). FS should just tell that something has to be reindexed.
>>2. It would be nice to have Objects of a DB queryable in the FS/File
>
>
> This will not be completely done in the first release, because it
> requires a complex storage architecture: for each file or directory, it
> should be decided if its contents are provided by a raw file on the
> underlying filesystem, or by a plugin which performs a db query, or
> parses and reassembles a file on the fly and so on, and this "storage
> plugins" architecture is completely orthogonal to an "index plugins"
> architecture:
I don't thing that this is the storage layer. I thought this is should
just do queries of the metainformation in the DB useing the FS as an
interface.
> - Index plugins can't fail, they can be run in batch queues and there
> can be more than one index plugin for any file.
Yeah!
> - Storage plugins can fail, are used interactively and if more than one
> plugin has to provide contents for a file, they should be stacked on
> top of each other - e.g. a plugin which turns email messages into
> directory, stacked on top of a plugin which provides email messages
> reading from an ftp site or a pop3 server.
Yes, storage layer is hard.
> Even if storage plugins will not be in the first relase, I absolutely
> want directories representing queries on the db, like "my punk/rock
> mp3s". Those will be implemented ad-hoc and then replaced by a storage
> plugin.
Yes!
>>3. It would be nice to modify the DB in the FS/File
>
>
> There are various implementations of this, and we can surely get one (if
> you refer to seeing db objects, like procedures, as files in the
> filesystem)
>
>
>>4. It would be nice to modify the FS/File in the DB
>>
>
>
> This will also be the purpose of index plugins but not in the first
> release, where we won't do this bidirectional communication (I think it
> can be done extending the SQL server and using triggers).
I think this is storage layer.
>>
>
>
> Yes, I have now a table which holds pairs object ids and names, which
> are not _paths_ but only the result of the "basename" function. There
> is also a "membership" relationship between objects which gives rise to
> paths. This way there is no logical difference between a file contained
> into a zip archive and a file contained into a directory. However I
> will have to find a convenient syntax to allow the coexistence of a zip
> file seen as a file and as a directory, in the same parent directory,
> something like "file.zip" and "file.zip#" to represent the directory,
> but where "#" is a character which is not allowed in an unix filename.
> Unfortunately it seems that the only not allowed character is "/" which
> would not work in many applications.
ok. I see "/" is not an optimum but reiserfsv4 uses it alread so
applications will be patched (If the userpreasure is big enugh). Though
you could have the character/string configurable. But appending a
special string is not that good if apps try to find out the type of a
file .zip-> zipped archive .zip# --> ???
> What are UUIDs? Is this a standard of some sort? Are UUIDs a function of
> file data? I am using just integers right now.
integers aren't enough... see my other mail for UUIDs (universal unique
identifier)
>>property a filename in the directory:
>>:cat "/archive/coolsong.mp3/genre"
>>:rock
>>
>
>
> I don't like this because a shell or an userspace program could perform
> a "dirname" operation on the file name to get its path, find that the
> path is not a directory and complain to the user - I would prefer a
> different character than "/" e.g. /archive/coolsong.mp3#genre, but am
> still unsure.
>
I see that's a problem but as I said reiserfsv4 does this this way...
>
>
> In fact I am going to leave the db schema unspecified, just like the
> filesystem hierarchy in a linux distribution; I plan to use more than
> one communication protocol for indexing applications, e.g. dbus but
> also xmlrpc or just dynamic loading and linking of shared libraries, or
> a command line (a la CGI) interface for simple shell scripts.
>
>
> Hmmm, even if I realize that it could be useful to reify attribute
> names, I would prefer to keep a rich relational structure, like an
> "mp3" table with author, size etc as columns, where e.g. author is a
> foreign key into another table. I know there are many problems, but it
> would allow us to exploit the full power of a relational database in
> user applications.
OK let's not talk about the db structure ... (believe me I know why)
> You see, I switched from C++ to ocaml because mine was not the best
> too :) However, language independence for index plugins is an important
> requirement for relfs - I would like to be able to even use shell
> scripts taking the modified file as argument, and outputting an sql
> script.
I give a ruby implementation a try (my ocaml is a litle ...rusted)
Peter
|
|
From: Vincenzo C. <vin...@ya...> - 2005-01-07 21:56:01
|
On Thursday 06 January 2005 12:02, Peter Schrammel wrote: > >>1. It would be nice to have the Objects of the FS/File in a DB. > > > > This will be the first feature to be implemented, using index > > plugins. > > Indexpluggins are nice but I wouldn't do indexing within the FS (see > my other mail). FS should just tell that something has to be > reindexed. > Yes, it's going to be this way: when the filesystem finds that a file should be reindexed it sends a message, using various protocols - dbus, xmlrpc and so on - to "plugins", which are programs that access the file and the database and perform the indexing. To do this there will be source-level indexers which do not generally (except for easy stuff like mimetypes) populate the db but just send the information on a certain protocol. So there will be a dbus "source plugin", a xmlrpc "source plugin" and so on. There is the problem to share an authentication mechanism for database connections but I think we won't solve that one right now. > >>2. It would be nice to have Objects of a DB queryable in the > >> FS/File > > > > This will not be completely done in the first release, because it > > requires a complex storage architecture: for each file or > > directory, it should be decided if its contents are provided by a > > raw file on the underlying filesystem, or by a plugin which > > performs a db query, or parses and reassembles a file on the fly > > and so on, and this "storage plugins" architecture is completely > > orthogonal to an "index plugins" architecture: > > I don't thing that this is the storage layer. I thought this is > should just do queries of the metainformation in the DB useing the FS > as an interface. Yes you're right, the storage layer would perform more complex things, like storing certain files like the package database of some linux distribution, which are traditionally stored in a not so efficient way, in database tables, and provide a file interface for programs which require it - and this is the part that will be implemented later - for now storage is just the plain filesystem. However it's important to implement it to allow nice features like chdir into archives and such. > ok. I see "/" is not an optimum but reiserfsv4 uses it alread so > applications will be patched (If the userpreasure is big enugh). > Though you could have the character/string configurable. But > appending a special string is not that good if apps try to find out > the type of a file .zip-> zipped archive .zip# --> ??? > Hmmm, if a file ending in "#" has the "d" bit set in its attributes, apps will not try to find its type by extension, but they will recognize that it is a directory. > > What are UUIDs? Is this a standard of some sort? Are UUIDs a > > function of file data? I am using just integers right now. > > integers aren't enough... see my other mail for UUIDs (universal > unique identifier) > I have read the man page of uuidgen, but - not that I won't use them :) - what advantages will we get instead of just using a serial column in the "obj" table? ids would be unique anyway - well, I see that the serial column is going to get exhausted and one could have copied the unique id outside the filesystem, so that the serial column is resetted by an user application, and then ids overlap, is this the problem? If UUIDs safety is probabilistic as it seem necessary, why shouldn't we use an hash, which is "statically safe" in the same way, on file contents as an id? Is it because it would require cascade updates for each update of the file or just for the obvious efficiency issues? Another issue is if the filesystem should only provide links, or if it should also provide "real" (even if stored on the other filesystem) files. In the second case there are many disadvantages, but in exchange we get accurate tracking of modifications and - more important - of file renames, and we could also provide query information integrated with "real" files, and in the future parsing and sinthesizing files - I think I will trust your reasons but also mine :) and allow both approaches: there will be "mounted" and "indexed" directories in relfs configuration, the former ones will be mirrored inside relfs, while the latter ones will be just indexed using libfam to watch'em - but I guess I will need to "hire" someone to write the libfam binding to ocaml :) So after your suggestion the world will be divided in three, and no longer two, categories: files inside the filesystem, files outside the filesystem but on the local machine, which will be indexed and watched, and "external files" - which are left for further releases. Bye Vincenzo |