Re: [Relfs-devel] Some thought's.... - News about relfs

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Thursday 06 January 2005 12:02, Peter Schrammel wrote:

> >>1. It would be nice to have the Objects of the FS/File in a DB.
> >
> > This will be the first feature to be implemented, using index
> > plugins.
>
> Indexpluggins are nice but I wouldn't do indexing within the FS (see
> my other mail). FS should just tell that something has to be
> reindexed.
>

Yes, it's going to be this way: when the filesystem finds that a file 
should be reindexed it sends a message, using various protocols - dbus, 
xmlrpc and so on - to "plugins", which are programs that access the 
file and the database and perform the indexing. To do this there will 
be source-level indexers which do not generally (except for easy stuff 
like mimetypes) populate the db but just send the information on a 
certain protocol. So there will be a dbus "source plugin", a xmlrpc 
"source plugin" and so on. There is the problem to share an 
authentication mechanism for database connections but I think we won't 
solve that one right now.

> >>2. It would be nice to have Objects of a DB queryable in the
> >> FS/File
> >
> > This will not be completely done in the first release, because it
> > requires a complex storage architecture: for each file or
> > directory, it should be decided if its contents are provided by a
> > raw file on the underlying filesystem, or by a plugin which
> > performs a db query, or parses and reassembles a file on the fly
> > and so on, and this "storage plugins" architecture is completely
> > orthogonal to an "index plugins" architecture:
>
> I don't thing that this is the storage layer. I thought this is
> should just do queries of the metainformation in the DB useing the FS
> as an interface.

Yes you're right, the storage layer would perform more complex things, 
like storing certain files like the package database of some linux 
distribution, which are traditionally stored in a not so efficient way, 
in database tables, and provide a file interface for programs which 
require it - and this is the part that will be implemented later - for 
now storage is just the plain filesystem. However it's important to 
implement it to allow nice features like chdir into archives and such.

> ok. I see "/" is not an optimum but reiserfsv4 uses it alread so
> applications will be patched (If the userpreasure is big enugh).
> Though you could have the character/string configurable. But
> appending a special string is not that good if apps try to find out
> the type of a file .zip-> zipped archive .zip# --> ???
>

Hmmm, if a file ending in "#" has the "d" bit set in its attributes, 
apps will not try to find its type by extension, but they will 
recognize that it is a directory.

> > What are UUIDs? Is this a standard of some sort? Are UUIDs a
> > function of file data? I am using just integers right now.
>
> integers aren't enough... see my other mail for UUIDs (universal
> unique identifier)
>

I have read the man page of uuidgen, but - not that I won't use them :) 
- what advantages will we get instead of just using a serial column in 
the "obj" table? ids would be unique anyway - well, I see that the 
serial column is going to get exhausted and one could have copied the 
unique id outside the filesystem, so that the serial column is resetted 
by an user application, and then ids overlap, is this the problem?

If UUIDs safety is probabilistic as it seem necessary, why shouldn't we 
use an hash, which is "statically safe" in the same way, on file 
contents as an id? Is it because it would require cascade updates for 
each update of the file or just for the obvious efficiency issues? 

Another issue is if the filesystem should only provide links, or if it 
should also provide "real" (even if stored on the other filesystem) 
files. In the second case there are many disadvantages, but in exchange 
we get accurate tracking of modifications and - more important - of 
file renames, and we could also provide query information integrated 
with "real" files, and in the future parsing and sinthesizing files - I 
think I will trust your reasons but also mine :) and allow both 
approaches: there will be "mounted" and "indexed" directories in relfs 
configuration, the former ones will be mirrored inside relfs, while the 
latter ones will be just indexed using libfam to watch'em - but I guess 
I will need to "hire" someone to write the libfam binding to ocaml :)

So after your suggestion the world will be divided in three, and no 
longer two, categories: files inside the filesystem, files outside the 
filesystem but on the local machine, which will be indexed and watched, 
and "external files" - which are left for further releases.

Bye

Vincenzo