relfs-devel Mailing List for Relational Filesystem (Page 2)
Status: Pre-Alpha
Brought to you by:
applejack
You can subscribe to this list here.
| 2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(6) |
Aug
(16) |
Sep
|
Oct
|
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2005 |
Jan
(12) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(5) |
Nov
(6) |
Dec
(3) |
| 2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2007 |
Jan
(5) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(3) |
| 2008 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(6) |
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|
From: <rel...@li...> - 2005-12-14 20:09:36
|
Hi Vincenzo et al, As far as I can see the only way round this is to use the physical part of the disk as the id of the file, and the very long name as the human readable format that a user chooses. So it means going to down to disk level to find out a unique identifier, like cylinders and heads and sectors, or even disk address! although there is the problem of transferring between disks, which would mean relfs depending on the human readable format maybe maybe not, as transfering it would take the first available numbers of an id that is not an address but an incremental number that relfs would recognise as such, it being not on the same disk as the operating system, but transferring to a disk would have to be different. Does this give any light to the obnoxious problem of: ->"get(ting) around what to do with identical filenames in one directory." Hal -- -- No Spam solicited, from you or by you, Thanks. -- ---Virus checked by sender - There are no viruses attached to this email.--- ---Although recipients are advised to apply their own virus check to this--- ---message and all their incoming e-mail on delivery. Need any help? Ask.--- |
|
From: Vincenzo C. <vin...@ya...> - 2005-11-23 22:56:39
|
Alle Wednesday 23 November 2005 08:41, Jasper van de Gronde ha scritto: > That's true (I'm assuming you're talking about the directory being > read-only and not necessarily the files in it), and I might indeed be > able to get by with that too. Yes, files are read-write because they belong to their original location. BTW it is already working that way on my laptop, but with disambiguation based on database ids :) Do you have something in mind that I don't see, an use case where writing to directories, somewhat containing duplicate files, might be useful? Here begins a not so useful digression - I was thinking about "why on earth am I designing a database filesystem, which should allow me to get rid of hierarchies, and the first thing I want to do is the hierarchy of all my mp3s, with authors at the first level and albums at the second?" - and here's a summary of my answer: I sometimes think that in a true database filesystem the user should be allowed to add files to "the system" and not to a specific directory. This "system" however must be slightly hierarchical again: you can't avoid to have two different files called README, but you could avoid that locally, for a single project or "group of files". This would be a single level hierarchy. But (thanks for the following remark go to a "local anonymous referee" who I suppose is ignoring relfs-devel e-mails in this period :) ) there are projects which include different versions of files in different subdirectories (e.g. translations of documentation), so at least second level of hierarchy would be needed. In general it looks like hierarchy is useful locally, even if for the whole file system it is a mess. In fact, hiearchy allows navigation, which is not possible using only queries, and it is also a lightweight and stable classification mechanism. For example, in relfs source distribution there is the "src" directory, and the (empty, sigh, and in CVS by mistake) "doc" directory. I would be crazy if I thought to distribute the thing without this small hierarchy. But in the large hierarchy sucks, so perhaps in the near future, when database filesystems will be widely used (yes I am trying to be optimistic) a typical usage will be to add "packages" to the "system", each one with a local hierarchy. Of course if there will be multiuser database filesystems, there will be no need for real "/usr/bin", "/usr/lib" and "/usr/doc" directories, even if each "package" will carry these. A shell will just query the database for all binary public files of all packages and the linker will do the same. But - I repeat since this is the point - hierarchy will resist, locally. Vincenzo |
|
From: Jasper v. de G. <th....@hc...> - 2005-11-23 07:41:36
|
Vincenzo Ciancia wrote: > Alle Sunday 20 November 2005 12:15, Jasper van de Gronde ha scritto: >>Apart >>from these minor drawbacks, however, it also doesn't solve the problem >>of saving/renaming files (you obviously don't want people to enter these >>numbers, yet it should be possible to overwrite files for example, as >>well as to create new files). I'd be interested in any ideas you (or >>others) may have on the subject though (might not be entirely on-topic, >>but the mailinglist doesn't seem to be busy enough for that to be much >>of a problem). > > From my point of view, saving and renaming files is not possible in a > directory representing a query on the database; query directories are > read-only. I don't see any obvious solution for the case where you can > create new files in a directory with "disambiguated" file names. That's true (I'm assuming you're talking about the directory being read-only and not necessarily the files in it), and I might indeed be able to get by with that too. |
|
From: Vincenzo C. <vin...@ya...> - 2005-11-23 00:27:20
|
Alle Sunday 20 November 2005 12:15, Jasper van de Gronde ha scritto: > Apart > from these minor drawbacks, however, it also doesn't solve the problem > of saving/renaming files (you obviously don't want people to enter these > numbers, yet it should be possible to overwrite files for example, as > well as to create new files). I'd be interested in any ideas you (or > others) may have on the subject though (might not be entirely on-topic, > but the mailinglist doesn't seem to be busy enough for that to be much > of a problem). =46rom my point of view, saving and renaming files is not possible in a=20 directory representing a query on the database; query directories are=20 read-only. I don't see any obvious solution for the case where you can=20 create new files in a directory with "disambiguated" file names. V. |
|
From: Jasper v. de G. <th....@hc...> - 2005-11-20 11:15:34
|
rel...@li... wrote: > On Saturday 08 Oct 2005 05:39, Vincenzo Ciancia said to us all: > > Its confusing to have filenames which are the same in the same directory, and > may cause complications. This can be resolved with the address attribute of > the harddisc address, which resolves adding extra junk into the table or > database. Queries would not need an extra redundant name. > When I deleted an ntfs partition, and made several ext3 partitions by mistake > overhastily, I then tried to get them back, and used several tools, both > windows and linux. A number of ways that they were portrayed gave me scope to > see where they were, even though they had the same names, as the tools put > all the files into numbered directories instead of in their original dirs > which had been lost. This real life scenario helped me to unformat certain > blocks and clusters, so I had to check their real addresses in order to bring > what I could back. Am I of any help? > Hal It is indeed a reasonable solution, but it is does have some drawbacks: - It only serves to distinguish files, it doesn't tell the user anything about what file it is. - It might change unexpectedly if the files are reordered (which is not uncommon, although perhaps uncommon enough to cause not too many problems). I can live with the first one, and the second could be worked around by not using an actual blocknumber but some kind of artificial index. Apart from these minor drawbacks, however, it also doesn't solve the problem of saving/renaming files (you obviously don't want people to enter these numbers, yet it should be possible to overwrite files for example, as well as to create new files). I'd be interested in any ideas you (or others) may have on the subject though (might not be entirely on-topic, but the mailinglist doesn't seem to be busy enough for that to be much of a problem). |
|
From: Vincenzo C. <vin...@ya...> - 2005-11-19 15:05:24
|
> On Saturday 08 Oct 2005 05:39, Vincenzo Ciancia said to us all: > > Its confusing to have filenames which are the same in the same directory, Did I really say that? :) > and may cause complications. This can be resolved with the address > attribute of the harddisc address, which resolves adding extra junk into > the table or database. Queries would not need an extra redundant name. Before all: the problem is that, when you perform a query, you can get as a= =20 result two different files, which are physically stored in two different=20 directories, but have the same name and should be shown in the same=20 "resulting" directory by the query. RelFS distinguishes these files by their unique ID (which you can think of = as=20 being the same thing as the inode number - which I think you are calling th= e=20 "harddisk address" present in ordinary filesystems) until it presents them = to=20 the user. =46rom then on, there is no way to tell which file the user asked for - so = this=20 is only a problem in the presentation layer - we should present files in su= ch=20 a way that the user can tell us back exactly what file he/she wants. So, if I understand you correctly, you propose to add the "inode number" as= a=20 part of the primary key for the table of all files in the database, but the= =20 database should not be affected by this problem, since we already have an=20 unique id, the only problem is not presenting the user an "anonymous" uniqu= e=20 id but something more meaningful. Of course we could eventually use the=20 unique id in case we can't find anything better. This weekend I will try to finish to implement the disambiguation scheme=20 proposed in this thread - but if you feel like I missed something in your=20 e-mail let us know. thanks and bye Vincenzo PS:=20 > Do not use, copy or disclose the information in any way nor act in > reliance on it and notify the sender immediately. Ahem... this information has already been disclosed :) Could you please ask= =20 who puts this banner on your e-mails to avoid doing that for this list?=20 |
|
From: <rel...@li...> - 2005-11-19 10:43:00
|
On Saturday 08 Oct 2005 05:39, Vincenzo Ciancia said to us all: Its confusing to have filenames which are the same in the same directory, and may cause complications. This can be resolved with the address attribute of the harddisc address, which resolves adding extra junk into the table or database. Queries would not need an extra redundant name. When I deleted an ntfs partition, and made several ext3 partitions by mistake overhastily, I then tried to get them back, and used several tools, both windows and linux. A number of ways that they were portrayed gave me scope to see where they were, even though they had the same names, as the tools put all the files into numbered directories instead of in their original dirs which had been lost. This real life scenario helped me to unformat certain blocks and clusters, so I had to check their real addresses in order to bring what I could back. Am I of any help? Hal ->"Alle Thursday 06 October 2005 10:31, Jasper van de Gronde ha scritto: ->"> In short, I can think of some solutions, but none of them seem perfect ->"> and I was wondering what other people's thoughts on this were. ->" ->"Yes, this may be one of the biggest concerns for relfs. By now non-unique ->"paths in queries behave in a "just wrong" fashion: relfs performs a "select ->"distinct ..." on the query result set. ->" ->"Queries are not the only place where multpile files per path can be found: ->"there is the "trash" folder for example (and see how old versions of kde ->"handle this, it was "just wrong" too). Now kde has a "trash:" protocol that ->"allows duplicate file names, but there is no way for an user to know where, ->"for example, a deleted file was before the unlink, you just have two "equal" ->"files that you can open and examine. Windows does the same. Also tar and zip ->"archives can have duplicated path entries because one added a new version of ->"a certain file. ->" ->"And the worst problem is that case by case a distinguishing attribute (e.g. ->"the original path in case of trash, or the addition time in case of archives) ->"must be selected. In this uncomfortable situation, I think that the easiest ->"way to avoid breaking abstraction is to just disambiguate names adding some ->"other attribute - e.g. suppose I select both ->" ->"/home/vincenzo/projects/ocamlfuse/src/README ->" ->"and ->" ->"/home/vincenzo/projects/thesis/src/README ->" ->"whe should show ->" ->"README [thesis] ->"README [ocamlfuse] ->" ->"where "thesis" and "ocamlfuse" are the first directory which differs in the ->"path. Of course, there is the problem that "PATH_MAX" gets shortened and one ->"will eventually need to shorten files a-la-vfat-on-8.3, say ->" ->"VeryLongFilename......~1[veryLongDirectory...Name] ->" ->"This looks like a good heuristic, because the upper you go in the directory ->"tree, the more probable it is that you choose a meaningful word to the user. ->" ->"Yes, this looks like trial and error. For example an user could prefer to see ->" ->"README [phd] ->" ->"and ->" ->"README [free software] ->" ->"because he has added ad-hoc extended attributes to classify certain subtrees. ->"Maybe we could provide easy-to-use disambiguation mechanisms. However this ->"goes far beyond the philosophy of "keeping it simple": what I would expect in ->"the future is that extended attributes, which are a great possibility ->"effectively unused in current filesystems, and that in relfs and I think many ->"other similar projects will be extensively used, will be soon or later ->"smoothly integrated in file managers and file open dialogs of applications. ->" ->"At this points, seeing README [directory_that_I_never_saw_before] will not be ->"such a big problem, because there will be a small "preview" panel on the side ->"of a file open dialog or of the file manager window showing all other ->"extended attributes and helping the user in the task of identifying files ->"that have the same name in the same query. ->" ->"Do you find this convincing? ->" ->" ->" ->"------------------------------------------------------- ->"This SF.Net email is sponsored by: ->"Power Architecture Resource Center: Free content, downloads, discussions, ->"and more. http://solutions.newsforge.com/ibmarch.tmpl ->"_______________________________________________ ->"Relfs-devel mailing list ->"Rel...@li... ->"https://lists.sourceforge.net/lists/listinfo/relfs-devel ->" ->" ->" -- The information contained in this email message may be confidential. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the we monitor e-mails sent and received. Further communication will signify your consent to this. Thank you. More information can be found at http://www.fasterbytes.net |
|
From: Vincenzo C. <vin...@ya...> - 2005-10-12 23:09:04
|
Alle Thursday 06 October 2005 10:31, Jasper van de Gronde ha scritto: > In short, I can think of some solutions, but none of them seem perfect > and I was wondering what other people's thoughts on this were. Yes, this may be one of the biggest concerns for relfs. By now non-unique paths in queries behave in a "just wrong" fashion: relfs performs a "select distinct ..." on the query result set. Queries are not the only place where multpile files per path can be found: there is the "trash" folder for example (and see how old versions of kde handle this, it was "just wrong" too). Now kde has a "trash:" protocol that allows duplicate file names, but there is no way for an user to know where, for example, a deleted file was before the unlink, you just have two "equal" files that you can open and examine. Windows does the same. Also tar and zip archives can have duplicated path entries because one added a new version of a certain file. And the worst problem is that case by case a distinguishing attribute (e.g. the original path in case of trash, or the addition time in case of archives) must be selected. In this uncomfortable situation, I think that the easiest way to avoid breaking abstraction is to just disambiguate names adding some other attribute - e.g. suppose I select both /home/vincenzo/projects/ocamlfuse/src/README and /home/vincenzo/projects/thesis/src/README whe should show README [thesis] README [ocamlfuse] where "thesis" and "ocamlfuse" are the first directory which differs in the path. Of course, there is the problem that "PATH_MAX" gets shortened and one will eventually need to shorten files a-la-vfat-on-8.3, say VeryLongFilename......~1[veryLongDirectory...Name] This looks like a good heuristic, because the upper you go in the directory tree, the more probable it is that you choose a meaningful word to the user. Yes, this looks like trial and error. For example an user could prefer to see README [phd] and README [free software] because he has added ad-hoc extended attributes to classify certain subtrees. Maybe we could provide easy-to-use disambiguation mechanisms. However this goes far beyond the philosophy of "keeping it simple": what I would expect in the future is that extended attributes, which are a great possibility effectively unused in current filesystems, and that in relfs and I think many other similar projects will be extensively used, will be soon or later smoothly integrated in file managers and file open dialogs of applications. At this points, seeing README [directory_that_I_never_saw_before] will not be such a big problem, because there will be a small "preview" panel on the side of a file open dialog or of the file manager window showing all other extended attributes and helping the user in the task of identifying files that have the same name in the same query. Do you find this convincing? |
|
From: Jasper v. de G. <th....@hc...> - 2005-10-08 10:52:14
|
Vincenzo Ciancia wrote: > ... > Queries are not the only place where multpile files per path can be found: > there is the "trash" folder for example (and see how old versions of kde > handle this, it was "just wrong" too). Now kde has a "trash:" protocol that > allows duplicate file names, but there is no way for an user to know where, > for example, a deleted file was before the unlink, you just have two "equal" > files that you can open and examine. Windows does the same. Also tar and zip > archives can have duplicated path entries because one added a new version of > a certain file. This is a good point, as I hardly ever use the trash bin (or something similar) I hadn't even thought of that. I just checked and Windows basically makes the trash and search results only available through it's explorer interface (which uses a special identifier internally, which can be transformed into a path when a file is opened for example). Of course this wouldn't really be an option for something like relfs (as it works on a lower level). > And the worst problem is that case by case a distinguishing attribute (e.g. > the original path in case of trash, or the addition time in case of archives) > must be selected. In this uncomfortable situation, I think that the easiest > way to avoid breaking abstraction is to just disambiguate names adding some > other attribute - e.g. suppose I select both > > /home/vincenzo/projects/ocamlfuse/src/README > > and > > /home/vincenzo/projects/thesis/src/README > > whe should show > > README [thesis] > README [ocamlfuse] Do you mean as a filename? So a user would use something like cat "README [thesis]" to view the file (on the console)? If so, it might not be too bad. > ... > This looks like a good heuristic, because the upper you go in the directory > tree, the more probable it is that you choose a meaningful word to the user. Especially since people are used to directories. > ... > because he has added ad-hoc extended attributes to classify certain subtrees. > Maybe we could provide easy-to-use disambiguation mechanisms. However this > goes far beyond the philosophy of "keeping it simple": For me the most important aspects of a disambiguation scheme are that it's not too hard to understand for users, predictable and preferrably reasonably stable. The above mentioned scheme should be relatively easy to understand, might be predictable enough (it might get a little complicated when more than two files are involved) and is at least more stable than dynamically assigning sequence numbers or something similar. > what I would expect in > the future is that extended attributes, which are a great possibility > effectively unused in current filesystems, and that in relfs and I think many > other similar projects will be extensively used, will be soon or later > smoothly integrated in file managers and file open dialogs of applications. That would indeed make life easier. Fortunately there is increasing interest in these sorts of things, so who knows. > ... > Do you find this convincing? It's net yet "perfect", but it's definately an interesting option. |
|
From: Vincenzo C. <vin...@ya...> - 2005-10-08 10:15:22
|
[apologizes for multiple copies, my mail-server seems to have forgotten my previous reply but could eventually change its mind] Alle Thursday 06 October 2005 10:31, Jasper van de Gronde ha scritto: > In short, I can think of some solutions, but none of them seem perfect > and I was wondering what other people's thoughts on this were. Yes, this may be one of the biggest concerns for relfs. By now non-unique paths in queries behave in a "just wrong" fashion: relfs performs a "select distinct ..." on the query result set. Queries are not the only place where multpile files per path can be found: there is the "trash" folder for example (and see how old versions of kde handle this, it was "just wrong" too). Now kde has a "trash:" protocol that allows duplicate file names, but there is no way for an user to know where, for example, a deleted file was before the unlink, you just have two "equal" files that you can open and examine. Windows does the same. Also tar and zip archives can have duplicated path entries because one added a new version of a certain file. And the worst problem is that case by case a distinguishing attribute (e.g. the original path in case of trash, or the addition time in case of archives) must be selected. In this uncomfortable situation, I think that the easiest way to avoid breaking abstraction is to just disambiguate names adding some other attribute - e.g. suppose I select both /home/vincenzo/projects/ocamlfuse/src/README and /home/vincenzo/projects/thesis/src/README whe should show README [thesis] README [ocamlfuse] where "thesis" and "ocamlfuse" are the first directory which differs in the path. Of course, there is the problem that "PATH_MAX" gets shortened and one will eventually need to shorten files a-la-vfat-on-8.3, say VeryLongFilename......~1[veryLongDirectory...Name] This looks like a good heuristic, because the upper you go in the directory tree, the more probable it is that you choose a meaningful word to the user. Yes, this looks like trial and error. For example an user could prefer to see README [phd] and README [free software] because he has added ad-hoc extended attributes to classify certain subtrees. Maybe we could provide easy-to-use disambiguation mechanisms. However this goes far beyond the philosophy of "keeping it simple": what I would expect in the future is that extended attributes, which are a great possibility effectively unused in current filesystems, and that in relfs and I think many other similar projects will be extensively used, will be soon or later smoothly integrated in file managers and file open dialogs of applications. At this points, seeing README [directory_that_I_never_saw_before] will not be such a big problem, because there will be a small "preview" panel on the side of a file open dialog or of the file manager window showing all other extended attributes and helping the user in the task of identifying files that have the same name in the same query. Do you find this convincing? Bye Vincenzo |
|
From: Jasper v. de G. <th....@hc...> - 2005-10-06 14:32:00
|
Vincenzo Ciancia wrote: > Many, many things have changed since the last relfs prototype! > > I didn't want to annunce anything until I'd have the time to solve the > tons of problems I had both with ocamlfuse and relfs. Now that I have > reached a stable development state, here I am - asking again for > request for enhancements, and advice for use cases you would like to > work in relfs. > > It's so rare to have a developer asking for "requests" that you can't miss > this great chance! Talk to us about your user data indexing problems! I've been exploring some ideas similar to relfs lately and one of the most obnoxious problems I haven't been able to get around is what to do with identical filenames in one directory. Traditionally this simply isn't allowed, which is workable because it's very clear what other files are in the directory. As soon as you introduce some kind of query-like directories this becomes a nightmare though, as it's hardly realistic to force people to use completely unique names for each file and dynamically renaming files is also less than ideal. Of course this can be solved by creating some new unique attribute which programs can use, but old programs won't know how to. And if you turn it around (use some unique id for the actual filename and expose the non-unique "filename" as some new attribute), which I would probably prefer, old programs will work but will probably look quite ugly (although it would be possible to make sure that the unique id looks as much like the "filename" as possible). In short, I can think of some solutions, but none of them seem perfect and I was wondering what other people's thoughts on this were. |
|
From: Vincenzo C. <vin...@ya...> - 2005-10-04 00:51:46
|
Many, many things have changed since the last relfs prototype! I didn't want to annunce anything until I'd have the time to solve the tons of problems I had both with ocamlfuse and relfs. Now that I have reached a stable development state, here I am - asking again for request for enhancements, and advice for use cases you would like to work in relfs. It's so rare to have a developer asking for "requests" that you can't miss this great chance! Talk to us about your user data indexing problems! The new prototype is available in binary flavour for Debian Sarge and Fedora Core 4 on the sf.net download page, or in source form in CVS as usual. NEWS 1. Directories representing queries now work The most important feature is that RelFS is now able to host sql queries as if they were directories: First of all, we copy some file into our mountpoint: $ cp *.ml relfs_dir/ Now we can list these files: $ ls relfs_dir/ Connection.ml Files.ml Indexes.ml Index.ml RelFS.ml Schema.ml Util.ml UUID.ml Now we can query the database for all files starting with a "C" (note that "%%" will be replaced with "/"): $ ls "relfs_dir/#select id,'%%'||name as path from obj where name like 'C%'" Connection.ml We can even "cd" inside such a directory (or better symlink it): $ cd "relfs_dir/#select id,'%%'||name as path from obj where name like 'I%'" $ ls -la drwxr-xr-x 1 vincenzo vincenzo 4096 2005-10-04 02:33 . drwxr-xr-x 2 vincenzo vincenzo 4096 2005-10-04 02:33 .. -rw-r--r-- 1 vincenzo vincenzo 4662 2005-10-04 02:33 Indexes.ml -rw-r--r-- 1 vincenzo vincenzo 3993 2005-10-04 02:33 Index.ml Of course, results are updated for each readdir, and files are like hardlinks to original ones, so if you edit Index.ml changes will be reflected in the original file. 2. Switch to the OCaml programming language is complete and there are binary releases The OCaml programming language is being used - and this resulted in really robust code (except where I know that it's not robust ;) ). This required implementing glue code to bind fuse functions to OCaml - this was done, and took a very long time to be done correctly, mostly due to problems interrupting the OCaml runtime from within multiple C threads. These problems are now solved, and there is a project named "ocamlfuse" on sourceforge, which is proving a very good solution to implement stable and maintenable userland filesystems. I released only binary files, one for FC4 and one for Sarge. To compile from source you have to use the CVS version, and I don't have any other distro handy to produce other binary releases. It's just to help planning features, not to distribute something that is of no use for the great public :) 3. Codebase has been cleaned up After having rewritten RelFS in OCaml, I realized I had many problems both with implementing correct unix semantics, and with caching metadata lookups from the database. So I reworked my code line by line, and in its current form it's very robust and fast, but there's no caching at all :) (yes, I had to make a choice). 4. What?? Is it complete? No, of course. For example, the queries "pathof" and "lookup" are really slow. This is because relational databases and trees don't agree. I will have to solve this - maybe just using materialized paths and who cares about renames :) Also, I will provide a definitive interface for plugins, and then I will have to write some good indexer. All filesystem metadata (e.g. results from stat) should be in the database. Extended attributes should be used and will be used as a communication interface with the database - e.g. tags from mp3s. A simple query (and query refinement) interface must be written in a hurry. 5. It feels so undocumented! Of course, I will answer to any question and document FAQs. It's experimental as usual. Bye and have fun Vincenzo Ciancia ___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it |
|
From: Vincenzo C. <vin...@ya...> - 2005-01-25 23:12:53
|
Hi, it's been some day since I committed the ocaml version - installation is rather a mess so be motivated if you try :) I'm still not so convinced about the database schema - there also is an ocaml binding to libuuid which is unused for now. The file "test.sh" is an example of calling an indexer written in another language. It's not fully functional (for example there's no proper string quoting) but I think it's cool: a plugin for this "protocol" is a program taking information about a modified file, that returns a database script without any interaction possibility - shell scripts fans will find their heaven there :) Information passed should be much more than the mere file name and notably should include the id of the object in the database. Also, there should be an algebra of file operations so that it can never happen that an indexer is called on a file that has been deleted. My next goal will be an implementation of queries as directories and db attributes as extended file attributes, then I think it will be time to make it work well, not just in an hackish way, and to write some useful plugin. In the meantime should someone be interested in looking at existing indexing programs and share ideas on how they could be integrated into a postgresql database, I'd be glad to hear about it ;) Is there some "standard" program or library we could use to extract structured information from many different kinds of files, in particular ps, pdf, openoffice documents and multimedia files? Is there something better than file(1) to get information about file types using their contents? File is not so bad, but it exits successfully even on failures so I have to use a bad hack (reading the english error message) to know if there were errors. Vincenzo |
|
From: Vincenzo C. <vin...@ya...> - 2005-01-18 00:17:38
|
Hi all - I was going to answer all the questions, and provide the ocaml port of relfs, and detailed installation instructions. However I am facing some difficulty so I won't delay replies any longer :) First of all, Peter, sorry for late reply: my laptop's hard drive broke badly last week. Luckily, I saved everything I needed but now that I have reinstalled a linux system and ocamlfuse and all the rest on my abandoned desktop, I realize that... I didn't backup my up-to-date relfs schema! This is not so bad but as you can imagine this will take some time. Regarding metafs, I have had a small chat with the metafs author, who has been very kind. We both have radically different ideas on the design, but I guess we will cooperate where possible (plugins?). Given the fact that both projects have the same chances to survive I think it's better to double these possibilites leaving the two projects separate as they are ;) For ocamlfuse, I'd have liked to publish a source snapshot today but the sf.net release system is absolutely not working since yesterday evening so as you can guess adding this problem to the broken disk made me totally upset - perhaps tomorrow will be a better day. Regarding UUIDs, I think that since they are mostly useful when synchronizing relfs with other filesystems or backup copies of the data, they can be an attribute of objects which will however keep on having an "internal" to the db unique identity based on integers, which are faster to index and query. If you think that UUIDs should have greater importance, we can discuss it if you please. I have much clearer ideas on what needs to be done - after putting the ocaml version on CVS together with installation instructions we will talk about that. Bye Vincenzo |
|
From: Peter S. <pet...@gm...> - 2005-01-12 18:46:11
|
http://metafs.sourceforge.net/ Perhaps you could add thins and a link to ocaml-fuse on the relfs website. Peter |
|
From: Vincenzo C. <vin...@ya...> - 2005-01-07 21:56:01
|
On Thursday 06 January 2005 12:02, Peter Schrammel wrote: > >>1. It would be nice to have the Objects of the FS/File in a DB. > > > > This will be the first feature to be implemented, using index > > plugins. > > Indexpluggins are nice but I wouldn't do indexing within the FS (see > my other mail). FS should just tell that something has to be > reindexed. > Yes, it's going to be this way: when the filesystem finds that a file should be reindexed it sends a message, using various protocols - dbus, xmlrpc and so on - to "plugins", which are programs that access the file and the database and perform the indexing. To do this there will be source-level indexers which do not generally (except for easy stuff like mimetypes) populate the db but just send the information on a certain protocol. So there will be a dbus "source plugin", a xmlrpc "source plugin" and so on. There is the problem to share an authentication mechanism for database connections but I think we won't solve that one right now. > >>2. It would be nice to have Objects of a DB queryable in the > >> FS/File > > > > This will not be completely done in the first release, because it > > requires a complex storage architecture: for each file or > > directory, it should be decided if its contents are provided by a > > raw file on the underlying filesystem, or by a plugin which > > performs a db query, or parses and reassembles a file on the fly > > and so on, and this "storage plugins" architecture is completely > > orthogonal to an "index plugins" architecture: > > I don't thing that this is the storage layer. I thought this is > should just do queries of the metainformation in the DB useing the FS > as an interface. Yes you're right, the storage layer would perform more complex things, like storing certain files like the package database of some linux distribution, which are traditionally stored in a not so efficient way, in database tables, and provide a file interface for programs which require it - and this is the part that will be implemented later - for now storage is just the plain filesystem. However it's important to implement it to allow nice features like chdir into archives and such. > ok. I see "/" is not an optimum but reiserfsv4 uses it alread so > applications will be patched (If the userpreasure is big enugh). > Though you could have the character/string configurable. But > appending a special string is not that good if apps try to find out > the type of a file .zip-> zipped archive .zip# --> ??? > Hmmm, if a file ending in "#" has the "d" bit set in its attributes, apps will not try to find its type by extension, but they will recognize that it is a directory. > > What are UUIDs? Is this a standard of some sort? Are UUIDs a > > function of file data? I am using just integers right now. > > integers aren't enough... see my other mail for UUIDs (universal > unique identifier) > I have read the man page of uuidgen, but - not that I won't use them :) - what advantages will we get instead of just using a serial column in the "obj" table? ids would be unique anyway - well, I see that the serial column is going to get exhausted and one could have copied the unique id outside the filesystem, so that the serial column is resetted by an user application, and then ids overlap, is this the problem? If UUIDs safety is probabilistic as it seem necessary, why shouldn't we use an hash, which is "statically safe" in the same way, on file contents as an id? Is it because it would require cascade updates for each update of the file or just for the obvious efficiency issues? Another issue is if the filesystem should only provide links, or if it should also provide "real" (even if stored on the other filesystem) files. In the second case there are many disadvantages, but in exchange we get accurate tracking of modifications and - more important - of file renames, and we could also provide query information integrated with "real" files, and in the future parsing and sinthesizing files - I think I will trust your reasons but also mine :) and allow both approaches: there will be "mounted" and "indexed" directories in relfs configuration, the former ones will be mirrored inside relfs, while the latter ones will be just indexed using libfam to watch'em - but I guess I will need to "hire" someone to write the libfam binding to ocaml :) So after your suggestion the world will be divided in three, and no longer two, categories: files inside the filesystem, files outside the filesystem but on the local machine, which will be indexed and watched, and "external files" - which are left for further releases. Bye Vincenzo |
|
From: Peter S. <pet...@gm...> - 2005-01-06 11:00:44
|
Vincenzo Ciancia schrieb:
> On Tuesday 04 January 2005 22:41, Peter Schrammel wrote:
>
> Welcome! This list has been "a little" quiet since relfs was born, and I
> had fewer spare time to code (but I worked on relfs a lot anyway) due
> to my job getting full-time.
>
> I am going to start my ph.d now, so I guess I will have more free time,
> in the meantime there are news:
good look (I guess you won't need it)
...
Ocaml is cool (I used it 4 years ago for a xml parser (my phD)) though I
prefer ruby now (just a question of taste I think).
> === 2. RelFS design, and goals for the first release ===
>
>
>
> Ok for 1. and 3., while for 2. I am still unsure on how to generalize it
> - not properties which could be seen as extended attributes (fuse
> supports those) but a membership relation: does it make sense to
> represent a mbox file as a directory of text messages? Of course, but
> what should each message look like? The text of the message without
> headers (because they can be shown as extended attributes of the file)
> or the plain text of the message? And how does one deal with message
> attachments? Each message should be a directory, but it could be
> unconvenient from the user point of view.
Ok let's talk about the storage layer:
I think you'll have two directions:
1. composition of file contents to one file
You have a directory of files and want to have this directory as a tgz.
E.g you have a directory of "My PhD " whit text/grafics... in it and you
want to offer that zipped on your webside (always the newset version of
course).
2. decomposition of a file
The other way round: You have one file but want to alter the contents
without decomposing the file:
E.g. xml files (very nice have ordered trees)
tar files (though you could untar them and do #1)
mp3 files
/etc/passwd /etc/group /etc/shadow
Jo. #1 is easy I think
#2 is harder
>
>>1. It would be nice to have the Objects of the FS/File in a DB.
>
>
> This will be the first feature to be implemented, using index plugins.
Indexpluggins are nice but I wouldn't do indexing within the FS (see my
other mail). FS should just tell that something has to be reindexed.
>>2. It would be nice to have Objects of a DB queryable in the FS/File
>
>
> This will not be completely done in the first release, because it
> requires a complex storage architecture: for each file or directory, it
> should be decided if its contents are provided by a raw file on the
> underlying filesystem, or by a plugin which performs a db query, or
> parses and reassembles a file on the fly and so on, and this "storage
> plugins" architecture is completely orthogonal to an "index plugins"
> architecture:
I don't thing that this is the storage layer. I thought this is should
just do queries of the metainformation in the DB useing the FS as an
interface.
> - Index plugins can't fail, they can be run in batch queues and there
> can be more than one index plugin for any file.
Yeah!
> - Storage plugins can fail, are used interactively and if more than one
> plugin has to provide contents for a file, they should be stacked on
> top of each other - e.g. a plugin which turns email messages into
> directory, stacked on top of a plugin which provides email messages
> reading from an ftp site or a pop3 server.
Yes, storage layer is hard.
> Even if storage plugins will not be in the first relase, I absolutely
> want directories representing queries on the db, like "my punk/rock
> mp3s". Those will be implemented ad-hoc and then replaced by a storage
> plugin.
Yes!
>>3. It would be nice to modify the DB in the FS/File
>
>
> There are various implementations of this, and we can surely get one (if
> you refer to seeing db objects, like procedures, as files in the
> filesystem)
>
>
>>4. It would be nice to modify the FS/File in the DB
>>
>
>
> This will also be the purpose of index plugins but not in the first
> release, where we won't do this bidirectional communication (I think it
> can be done extending the SQL server and using triggers).
I think this is storage layer.
>>
>
>
> Yes, I have now a table which holds pairs object ids and names, which
> are not _paths_ but only the result of the "basename" function. There
> is also a "membership" relationship between objects which gives rise to
> paths. This way there is no logical difference between a file contained
> into a zip archive and a file contained into a directory. However I
> will have to find a convenient syntax to allow the coexistence of a zip
> file seen as a file and as a directory, in the same parent directory,
> something like "file.zip" and "file.zip#" to represent the directory,
> but where "#" is a character which is not allowed in an unix filename.
> Unfortunately it seems that the only not allowed character is "/" which
> would not work in many applications.
ok. I see "/" is not an optimum but reiserfsv4 uses it alread so
applications will be patched (If the userpreasure is big enugh). Though
you could have the character/string configurable. But appending a
special string is not that good if apps try to find out the type of a
file .zip-> zipped archive .zip# --> ???
> What are UUIDs? Is this a standard of some sort? Are UUIDs a function of
> file data? I am using just integers right now.
integers aren't enough... see my other mail for UUIDs (universal unique
identifier)
>>property a filename in the directory:
>>:cat "/archive/coolsong.mp3/genre"
>>:rock
>>
>
>
> I don't like this because a shell or an userspace program could perform
> a "dirname" operation on the file name to get its path, find that the
> path is not a directory and complain to the user - I would prefer a
> different character than "/" e.g. /archive/coolsong.mp3#genre, but am
> still unsure.
>
I see that's a problem but as I said reiserfsv4 does this this way...
>
>
> In fact I am going to leave the db schema unspecified, just like the
> filesystem hierarchy in a linux distribution; I plan to use more than
> one communication protocol for indexing applications, e.g. dbus but
> also xmlrpc or just dynamic loading and linking of shared libraries, or
> a command line (a la CGI) interface for simple shell scripts.
>
>
> Hmmm, even if I realize that it could be useful to reify attribute
> names, I would prefer to keep a rich relational structure, like an
> "mp3" table with author, size etc as columns, where e.g. author is a
> foreign key into another table. I know there are many problems, but it
> would allow us to exploit the full power of a relational database in
> user applications.
OK let's not talk about the db structure ... (believe me I know why)
> You see, I switched from C++ to ocaml because mine was not the best
> too :) However, language independence for index plugins is an important
> requirement for relfs - I would like to be able to even use shell
> scripts taking the modified file as argument, and outputting an sql
> script.
I give a ruby implementation a try (my ocaml is a litle ...rusted)
Peter
|
|
From: Peter S. <pet...@gm...> - 2005-01-06 10:18:45
|
Vincenzo Ciancia schrieb: > On Wednesday 05 January 2005 23:05, Peter Schrammel wrote: > >>1. create an index of meta-data without storing the object itself (no >>storage level implementation) >> This could be done by a file alteration monitor filesystem (I >>outlined this in my last mail) > But the indexing needn't be done in the Filesstem: Your filesystem just has to inform an indexing system that something was done: 1. The "watcher" filesystem discovers that a file has been moved from a to b 2. It sends a message on the dbus wht's been done. 3. The watcher FS has done it's job. -- 1. The dbus daemon like "beagled" listens on the dbus and start's indexers according to the informations from the dbus 2. The indexers can be written in any language and are just files in the filesystem (look at udev). 3. the job for the indexer daemon is done -- 1. the relfs is the retrieval fs. it's job is to give me back links to the files the watcher fs watches. So it's job is just to store my queries and to execute them. I'd let the storage of objects out of scope (at least until v1.0 is out) > The purpose of relfs is to store objects on the underlying filesystem > (so that FAM is not necessary) and metadata on the db, but I am sure > that there will also be "external objects" e.g. all the objects that > have been seen by a web proxy connected to relfs, or all the objects > that have been backed up on a cdrom, which will have their metadata > held in the db. This means that we should store hashes for all files - > or some other "unique" identification method. Do hashing algorithms > operate on a per-block basis, so that modifications can be done > efficiently? I don't think that this works: You need to have something unique stored in the file (but this you can't grant). Any system with hashes is just a good guess but nothing more. UUID (Universal unique ids -- try uuid-gen or have a look at the labels of the filesystems (man mount)) could help here if you could attach them to your files (somehow). I'd say that files outside the watched filesystem should be out of scope for a while. Peter |
|
From: Vincenzo C. <vin...@ya...> - 2005-01-06 09:04:22
|
On Wednesday 05 January 2005 23:05, Peter Schrammel wrote: > 1. create an index of meta-data without storing the object itself (no > storage level implementation) > =A0 This could be done by a file alteration monitor filesystem (I > outlined this in my last mail) The purpose of relfs is to store objects on the underlying filesystem=20 (so that FAM is not necessary) and metadata on the db, but I am sure=20 that there will also be "external objects" e.g. all the objects that=20 have been seen by a web proxy connected to relfs, or all the objects=20 that have been backed up on a cdrom, which will have their metadata=20 held in the db. This means that we should store hashes for all files -=20 or some other "unique" identification method. Do hashing algorithms=20 operate on a per-block basis, so that modifications can be done=20 efficiently? V. |
|
From: Vincenzo C. <vin...@ya...> - 2005-01-06 08:57:40
|
On Wednesday 05 January 2005 15:40, Wolfgang Illmeyer wrote: > I've been subscribed for a few weeks to this list, and I didn't hear > very much from it.. How many people are we actually? How's relfs > development doing? There are twelve people subscribed, and relfs development is done by me in my spare time, which has been few in the last two months and will be more in the next year. Discussion has not been active here, but there are open questions (some of which are being answered in this discussion) in the list archives. I use every quarter of hour of free time to design and code, and switched to ocaml as explained in my other e-mail because it's faster to code correctly in that language (at least for me), so that I can use this free time in the best possible way. The true timewasters now are design decisions like the one that you and Peter are discussing so your thread is actually being of help :) There is this problem related to hierarchies: it is possible to allow directories which represent flat queries, as "select * from obj where mimetype=audio/mp3 and author like '%queen%'", but how does one say "show me all my mp3s, wherever they are on the disk, grouped in directories, first by author and then by genre"? I am not sure on how a hierarchy could be expressed as the result of an sql query (of course we can somewhat extend the query language). V. |
|
From: Vincenzo C. <vin...@ya...> - 2005-01-06 08:47:22
|
On Tuesday 04 January 2005 22:41, Peter Schrammel wrote: > Hi > > I'm new to this list but I want to give my 2c for this gorgeous > project! Welcome! This list has been "a little" quiet since relfs was born, and I had fewer spare time to code (but I worked on relfs a lot anyway) due to my job getting full-time. I am going to start my ph.d now, so I guess I will have more free time, in the meantime there are news: === 1. OCaml port of RelFS === I faced hard problems with c++, related to memory management and passing of dynamically allocated memory between threads, and realized that I am not a C++ expert, and also that C++ is not the ideal language to quickly write prototypes implementing new ideas. In a word, I found that it would take more time for me to become good at coding in C++ than for any C++ coder to become good at coding in a simpler language like OCaml. I made up my mind and did the port to OCaml - the hardest part is to get a multithreaded (in the sense that one handles multiple fs callbacks at the same time) fuse binding for this language, there was one but was not designed for multithreading and was not up-to-date. I wrote another ocaml fuse binding, available at http://www.sourceforge.net/projects/ocamlfuse Even if it's up-to-date and designed for multithreading, it's not stable enough on the latter: after some milion of requests it crashes for still unknown reasons - while in single-threaded mode it works well. This is not a serious problem at this time (requests usually interleave well with each other) but a production system can't afford to block while waiting for the cdrom tray to close - so I will have to make multithreading work better in the future - I have several alternatives and am sure that I will find my way, but I can keep on working on relfs in the meantime. Also, I will have to improve speed (by now filesystem data is copied twice and this makes it slow - about 10Mb per second on my centrino laptop). I made a port of the relfs core, and I'm satisfied: the source is now about 300 lines of code, which is exactly its value :) Moreover, apart from functional programming and its very good type system, ocaml has superb multithreading primitives and libraries and I hope this will pay off during development. Said this, I did not commit any changes to CVS, since I had no time to write installation instructions (postgresql-ocaml is needed), however I think I am going to make a branch next week to allow everybody to see it. === 2. RelFS design, and goals for the first release === Here we come to your e-mail :) In the last months I also redesigned the DB schema which looks a little like you propose, I will put it in the CVS next week. > > First I want to structure my thought's about FS, Files and > Databases.... > 1. > Filesystems hold Objects with some properies: 2. > Files hold Objects with some properties: 3. > Databases hold Objects with some properties. Ok for 1. and 3., while for 2. I am still unsure on how to generalize it - not properties which could be seen as extended attributes (fuse supports those) but a membership relation: does it make sense to represent a mbox file as a directory of text messages? Of course, but what should each message look like? The text of the message without headers (because they can be shown as extended attributes of the file) or the plain text of the message? And how does one deal with message attachments? Each message should be a directory, but it could be unconvenient from the user point of view. > > ---- > > 1. It would be nice to have the Objects of the FS/File in a DB. This will be the first feature to be implemented, using index plugins. > 2. It would be nice to have Objects of a DB queryable in the FS/File This will not be completely done in the first release, because it requires a complex storage architecture: for each file or directory, it should be decided if its contents are provided by a raw file on the underlying filesystem, or by a plugin which performs a db query, or parses and reassembles a file on the fly and so on, and this "storage plugins" architecture is completely orthogonal to an "index plugins" architecture: - Index plugins can't fail, they can be run in batch queues and there can be more than one index plugin for any file. - Storage plugins can fail, are used interactively and if more than one plugin has to provide contents for a file, they should be stacked on top of each other - e.g. a plugin which turns email messages into directory, stacked on top of a plugin which provides email messages reading from an ftp site or a pop3 server. Even if storage plugins will not be in the first relase, I absolutely want directories representing queries on the db, like "my punk/rock mp3s". Those will be implemented ad-hoc and then replaced by a storage plugin. > 3. It would be nice to modify the DB in the FS/File There are various implementations of this, and we can surely get one (if you refer to seeing db objects, like procedures, as files in the filesystem) > 4. It would be nice to modify the FS/File in the DB > This will also be the purpose of index plugins but not in the first release, where we won't do this bidirectional communication (I think it can be done extending the SQL server and using triggers). > The first two aren't that hard I think but you should keep one thing > in mind: a file is not it's filename it's an abstract concept. I > would give files a UUID. A UUID can be given a name like > "/archive/coolsong.mp3" or another name > "/archive/genre/rock/coolsong.mp3" (some call this a link). > Yes, I have now a table which holds pairs object ids and names, which are not _paths_ but only the result of the "basename" function. There is also a "membership" relationship between objects which gives rise to paths. This way there is no logical difference between a file contained into a zip archive and a file contained into a directory. However I will have to find a convenient syntax to allow the coexistence of a zip file seen as a file and as a directory, in the same parent directory, something like "file.zip" and "file.zip#" to represent the directory, but where "#" is a character which is not allowed in an unix filename. Unfortunately it seems that the only not allowed character is "/" which would not work in many applications. What are UUIDs? Is this a standard of some sort? Are UUIDs a function of file data? I am using just integers right now. > The second problem is that you have to access the properties with the > filname. Be realistic nobody will use special tools to query your FS. > I know :) > A good aproach would be waht the guys from reiserFSv4 did > (http://www.namespace.com): > every file has it's properties accessed if it was a directory and the > > property a filename in the directory: > :cat "/archive/coolsong.mp3/genre" > :rock > I don't like this because a shell or an userspace program could perform a "dirname" operation on the file name to get its path, find that the path is not a directory and complain to the user - I would prefer a different character than "/" e.g. /archive/coolsong.mp3#genre, but am still unsure. > Question are there any properties in the filesystem that > are attached to the filename? Yes! A comment on the filename > "/etc/shadows" is not a comment to it's content. This is _exactly_ the motivating example to assume that there will be properties attached to paths vs properties attached to files (objects) - it seems that we are going to use relfs for the same reasons :) > So usually a filename indicates a file as above. But you can do > something like: > echo "This file is a Security risk" > /etc/shadow/filename/comment > you could even do comment's on comment ;-) > > > What about directories? It's the same but it means some loss to the > filenamespace ( a special filename e.g. .rfs indicates the > properties) This is another trouble with using "/" as the final separator. > > :echo "my brother's files! don't remove" > /archive/.rfs/comment > > So most of the 4 quest are solved: > > 1. putting the Objects of the FS/Files into a DB (I think you call it > proxy) I thought of FAM (File alternation monitor) and udev: Just > tell other programs, that something has changed and they'll do the > rest. So your FS would just send a message e.g. on the DBUS that a > file has been altered/ created. Daemons (even user-daemons) could > listen on the dbus and do some caching of the information (like > extracting mp3 tags, do checksums...). Don't force a specific DB > schema on them...they'll hack around it. In fact I am going to leave the db schema unspecified, just like the filesystem hierarchy in a linux distribution; I plan to use more than one communication protocol for indexing applications, e.g. dbus but also xmlrpc or just dynamic loading and linking of shared libraries, or a command line (a la CGI) interface for simple shell scripts. > Here a simple aproach for a DB modell: > > FS Objects: > UUID char(33) > The fact that you write "char(33)" makes me think that there is a written specification. There is this problem with the identity of objects, that it's going to be lost if you copy the file "outside" the filesystem, so you loose any information attached to this uuid, unless it's like an md5sum, but then it's going to change when the file is modified, and so we would need cascade update of primary keys everywhere. There is also the problem of open-endness: how do I join two different filesystems on different machines if object ids can collide? > FS Attribnames: > ID integer e.g. 9088 > value integer e.g. 78 (specific to the plugin) > plugin char(16) e.g. mp3tag > name char(16) e.g. artist Hmmm, even if I realize that it could be useful to reify attribute names, I would prefer to keep a rich relational structure, like an "mp3" table with author, size etc as columns, where e.g. author is a foreign key into another table. I know there are many problems, but it would allow us to exploit the full power of a relational database in user applications. > Sorry my C++ is not the best but with this aproach I could even use > haskell as filters FS->DB ;-). > You see, I switched from C++ to ocaml because mine was not the best too :) However, language independence for index plugins is an important requirement for relfs - I would like to be able to even use shell scripts taking the modified file as argument, and outputting an sql script. V. |
|
From: Peter S. <pet...@gm...> - 2005-01-05 22:04:09
|
I see your point. But I think we have 3 things here: 1. create an index of meta-data without storing the object itself (no storage level implementation) This could be done by a file alteration monitor filesystem (I outlined this in my last mail) 2. Do queries based on the meta-data using a fs as an interface to the db (still no storage level implementation) 3. save the object in the db (storage level implementation) a) read only b) read/write 1: simple I think... 2: As far as I know postgres can do evaluated queries: So e.g. it would by cool if you'd do something like: echo "select filename+ooid+ending from objectlist where year='1993' and mime-type like 'image%'" > /year1993_photos/.relfs/readdir dir /thisdir my_my_on_my_boat.jpg-897987-987879-879879.jpg birtday.jpg-90809809-098089-098098.jpg The query could be stored in the DB (your girlfriend would appriciate that) Of course you could do: ls "/.global_relfs/select uuid from objectlist where year='1993' and mime-type like 'image%'" The .global_relfs gives you full access on all storage objects, but has no state (doesn't remember your queries) a list on /.global_relfs wouldn't give you back anything. but opening a file /.global_relfs/<uuid> get's you to the storage object. I wouldn't query for filenames here because they aren't unique (unless you store them with full pathnames, but that would be a horror ....) or you replace the '/' from the original files by e.g. '-' So you could store/alter the query for a given directory at runtime with system tools. Could use add-hoc queries The hardest thing is the database model. Is has to be very flexible which usually means graphs. But graph queries have a bad performance on relational DB (though for me this is not the point: it's better to have the data I want in 1s then wrong data in 0.01s 1000 times. 3. (That's tricky...) Good night Peter |
|
From: Wolfgang I. <wol...@gm...> - 2005-01-05 14:41:50
|
Peter Schrammel wrote: > oach would be waht the guys from reiserFSv4 did > (http://www.namespace.com): > every file has it's properties accessed if it was a directory and the > property a filename in the directory: > :cat "/archive/coolsong.mp3/genre" > :rock > > but again: we should not attach this property to the filename but to > the file. Question are there any properties in the filesystem that are > attached to the filename? Yes! A comment on the filename > "/etc/shadows" is not a comment to it's content. > So usually a filename indicates a file as above. But you can do > something like: > echo "This file is a Security risk" > /etc/shadow/filename/comment > you could even do comment's on comment ;-) > Hi, I'm also new to the project, and i'd like do drop a few comments on querying the filesystem/database. The primary advantage of a relational database is that it's contents dont' have a single hierarchy (e.g. a directory tree). If you want to put something in a traditional filesystem, you split your content in different categories, eg. photos: First category: Year of capture. 2001 2002 2003 2004 ... Then again, you want your photos sorted by some other category, too. for example photos taken indoor and outdoor. This forces you to create a ton of directories 2001/indoor 2001/outdoor 2002/indoor 2002/outdoor .. or you can also try this approach: indoor/2001 indor/2002 indoor/2003 outdoor/2001 In either case it will suck, because if you add another category and you have to search for pictures of a certain category, you'll have _many_ directories to go through. When I subscribed to the list, I had the following "query language" in mind: i name the different categorisation schemes, eg. there's a property 'year' that can be a value from 2001 to 2005, 'location' can be, say 'indoor', 'outdoor' and 'underwater'. If I now want to 'query' my relational file system on underwater photos from 2002, i put: cd /relfs/fotos/year=2002/category=underwater ls or cd /relfs/fotos/category=underwater/year<2002/year>2000 ls ^---- I'm not bound to a fixed directory hierarchy (at least not starting from the relfs-mountpoint ;) I've been subscribed for a few weeks to this list, and I didn't hear very much from it.. How many people are we actually? How's relfs development doing? Greetings Wolfgang |
|
From: Peter S. <pet...@gm...> - 2005-01-04 21:39:47
|
Hi I'm new to this list but I want to give my 2c for this gorgeous project! First I want to structure my thought's about FS, Files and Databases.... Filesystems hold Objects with some properies: filename mtime atime user group mode Objects (Files) ... Files hold Objects with some properties: author audio-stream video-stream ... Databases hold Objects with some properties. ---- 1. It would be nice to have the Objects of the FS/File in a DB. 2. It would be nice to have Objects of a DB queryable in the FS/File 3. It would be nice to modify the DB in the FS/File 4. It would be nice to modify the FS/File in the DB The first two aren't that hard I think but you should keep one thing in mind: a file is not it's filename it's an abstract concept. I would give files a UUID. A UUID can be given a name like "/archive/coolsong.mp3" or another name "/archive/genre/rock/coolsong.mp3" (some call this a link). The problem is that in an FS you always start with the filename and you'd have to specify if you mean the file or the filename. The second problem is that you have to access the properties with the filname. Be realistic nobody will use special tools to query your FS. A good aproach would be waht the guys from reiserFSv4 did (http://www.namespace.com): every file has it's properties accessed if it was a directory and the property a filename in the directory: :cat "/archive/coolsong.mp3/genre" :rock but again: we should not attach this property to the filename but to the file. Question are there any properties in the filesystem that are attached to the filename? Yes! A comment on the filename "/etc/shadows" is not a comment to it's content. So usually a filename indicates a file as above. But you can do something like: echo "This file is a Security risk" > /etc/shadow/filename/comment you could even do comment's on comment ;-) What about directories? It's the same but it means some loss to the filenamespace ( a special filename e.g. .rfs indicates the properties) :echo "my brother's files! don't remove" > /archive/.rfs/comment So most of the 4 quest are solved: 1. putting the Objects of the FS/Files into a DB (I think you call it proxy) I thought of FAM (File alternation monitor) and udev: Just tell other programs, that something has changed and they'll do the rest. So your FS would just send a message e.g. on the DBUS that a file has been altered/ created. Daemons (even user-daemons) could listen on the dbus and do some caching of the information (like extracting mp3 tags, do checksums...). Don't force a specific DB schema on them...they'll hack around it. 2. querying the DB in the FS see above.... 3. modify the DB in the FS/File easy see above... 4. modify the FS/File in the DB hey that's tricky ... ----- ----- Here a simple aproach for a DB modell: FS Objects: UUID char(33) FS Attribnames: ID integer e.g. 9088 value integer e.g. 78 (specific to the plugin) plugin char(16) e.g. mp3tag name char(16) e.g. artist 1 1 system filename 2 2 system uid 3 3 system gid 4 4 system size_in_bytes 5 5 system mode 6 1 chsum md5 7 2 chsum sha1 8 1 mp3tag artist 9 2 mp3tag genre 10 3 mp3tag song FS Attribs UUID char(33) ATTRIB_ID int4 value text for speeds sake you could do some caching tables for the system plugin. ----------- Another aproach would be something from graph theory: Objects: UUID char(33) value text Links: UUID_from char(33) UUID_to char(33) creating a file means to create some objects: 1. the file itself value as the content 2. an object for the name 3. a link from object to name 4. a link from the name to an object that has the value "filename" ... you see this wouldn't be that fast but very clear and mighty. I'm very interested in the 1. point of this aproach: Getting informed if files are changed and do some proxying. Sorry my C++ is not the best but with this aproach I could even use haskell as filters FS->DB ;-). Peter |
|
From: Vincenzo a. N. N. <vin...@ya...> - 2004-08-16 10:08:48
|
Given the number of daily posts I suppose this is a non-issue :) , however to avoid people worried by unanswered questions I would like to tell everybody that I'll not use e-mail for the next two or three weeks; in september I'll answer unread e-mails. Bye Vincenzo |