Share

Tcl

Tracker: Bugs

5 path -->FS function limitations - ID: 941872
Last Update: Comment added ( dgp )


Tcl's path management code has a
fundamental feature that each path
belongs to exactly one filesystem.

The way this is implemented is that
Tcl iterates over all registered filesystems
in reverse order of their registration and
asks each one "does this file belong to you?"
(via a call to the FS's pathInFilesystemProc).
Each filesystem is allowed to answer YES
or NO, and the first one that answers YES
is declared the unique owner of that path.
The native filesystem is registered first, so
it is queried last, and always answers YES,
so the fallback filesystem for unclaimed path
values is the native filesystem.

All operations on a path will be passed
to its owning filesystem, and after any
shimmering, that owning filesystem will
be the one that gets to create the refreshed
internal rep, suitable only for that filesystem.

This feature is what makes the public
routine Tcl_FSGetFileSystemForPath(pathPtr)
make sense. You pass in a path, and get back
the unique filesystem that owns that path.

There are several limitations that this feature
imposes that are unattractive.

1) Impossible to implement "stacked" filesystems.

One could imagine a filesystem that claimed all
paths, did some kind of logging, or other filtering
on the operations, and then passed along to an
appropriate "real" filesystem to do the actual work.

It's simple enough to create a new filesystem that
claims all paths and does the filtering work, but
things would break as soon as it attempted to pass
the path along to the real filesystem. When the
real filesystem tried to do operations on the path,
the path would be recognized as belonging to the
stacked filesystem, so operations would get passed
back up to the filtering layer again. The real filesystem
would not be able to create/fetch its own internal rep
for the path, because the filtering filesystem's claim
on the path would keep producing the filtering
filesystem's interal rep instead.

One might imagine the filtering filesystem releasing
its claim on the path before it passes it down to the
real filesystem (change internal state, so the filtering
filesystem no longer says YES to this particular path),
but there doesn't seem to be any way to do that in
a thread safe manner. While the filtering filesystem
in Thread A abandons its claim on a path to pass it
on to the real filesystem, then operations on the same
path in Thread B bypass the filtering operation.

Another difficulty is that a filtering filesystem could
only work if it were registered after the filesystem(s)
it is filtering. For filtering the native filesystem, this
might work, as long as the native filesystem registered
first rule persists, but filtering of other filesystems
would not be robust, as order of filesystem registration
is essentially impossible to control (multi-threads).

2) Impossible to implement mountable archive file.

Imagine an archive file in some filesystem:

/path/to/archive.ar

[file system /path/to/archive.ar] will return the
filesystem in which that archive file is stored and
[file type /path/to/archive.ar] will return "file". Then
imagine mounting that archive with the mount command
appropriate for the filesystem in that archive:

fs::mount /path/to/archive.ar

The desire is that [file system /path/to/archive.ar] will
return "fs" and [file type /path/to/archive.ar] will return
"directory" and it will now be possible to access the
contents
of the archive as virtual files, as in:

open /path/to/archive.ar/internal/foo.bar

As in the first case, this will founder because it
depends on the single path /path/to/archive.ar
being able to belong to two filesystems at once.
The internal operations of filesystem "fs" will
need to be defined as operations of the original
filesystem on the archive file, but Tcl will keep
insisting that "fs" is the only filesystem that can
operate on that path.

"But wait!" you say. Doesn't Tclkit/Starkit/mkfs do
this? Well, yes it does, but it manages to do it
by having its access to the underlying filesystem not
pass back through Tcl. (Correct me if I'm wrong)
By not going back to Tcl to do the lower level
operations, it can avoid Tcl's insistent assignment
of the archive path to the vfs later. However, this
also means that only native files that are accessible
without going back through Tcl are able to be archive
files. This means no nesting of archives, and it means
that the illusion that virtual files and native files are
equivalent is incomplete. I'm pretty sure (again,
someone explain if I'm mistaken) that this means
it's not possible to mount such an archive remotely
(within an HTTP or FTP virtual filesystem).

3) Impossible to mount one FS within another.

I think this is probably the most significant limitation.

Each filesystem gets one chance to say YES or NO
to owning a path. If the path points to an existing
file in the filesystem, then YES is the easy answer.
If the (normalized) path is completely outside a
filesystem's mount points, then NO is an equally
easy answer. The remaining cases are tricky.

Say that the path

/foo/bar/soom/example

is to be tested, and my filesystem holds the
mountpoint /foo/bar . Should my filesystem say YES?
Several cases:

/foo/bar/soom/example exists in my FS -> YES
(modulo the mountable archive problem already noted)

/foo/bar/soom is not a directory in my FS -> NO

Otherwise -> ????

In the final case /foo/bar/soom is my directory, but
/foo/bar/soom/example does not (yet) exist in my
FS. It seems my FS should answer YES, because
if that file were to be created, it would involve
writing to a directory which is mine.

However, it's also possible that
/foo/bar/soom/example is a mountpoint for
another filesystem. If I answer YES, then that
filesystem will never get its chance to claim its
mountpoint. On the other hand, if I answer NO,
and there is no such other filesystem that claims
that mountpoint, then attempts to create that
path will be routed to the native FS, when they
ought to be routed to me. The only way to avoid
this dilemma is to not allow nested FS mounting.

As noted before filesystem registration order
is not really available to control to avoid this issue.
The exception is the native filesystem, which is
always last. This means the most common case,
adding mounts within the native filesystem works,
even though as a general operation it's fundamentally
broken. Another common case, adding mount points
outside of any existing filesystem, like http:// -rooted
paths, also works fine.

I'm assigning this report first to Andreas since
he expressed interest in these issues. It should
get passed along to Vince Darley as well. I'm
registering this as a bug, even though it's arguably
a feature request, because if these limitations are
to continue, they should be more clearly spelled
out in the documentation.

Just to record a few half-thoughts on an alternative,
it seems that mountpoints are the special cases that
get in the way of the current scheme. Perhaps if
the (pathInFilesystemProc)s had a richer set of
answers, things could be improved. Rather than a
simple YES (return TCL_OK) or NO (return -1),
some other answers might be "THAT'S MY MOUNT"
or "I COULD TAKE THAT IF NO ONE ELSE DOES".
The current docs clearly leave other return values
undefined, so they could be used for such an
expansion. The stacked filesystems problem seems
to be more difficult.


Don Porter ( dgp ) - 2004-04-25 17:40

5

Open

None

Andreas Kupries

37. File System

obsolete: 8.5a2

Public


Comments ( 3 )

Date: 2004-06-03 17:30
Sender: dgpProject Admin

Logged In: YES
user_id=80530


jenglish pointed out another
consequence on the chat:

afs::mount a.archive /a/b/c/d
bfs::mount b.archive /a/b
file mkdir /a/b/c/d/e

Did that create directory e in a.archive?
Or did it create directory c/d/e in b.archive?

The answer apparently depends
on Filesystem registration order, which
probably translates into the order of
the [package require afs] and
[package require bfs] commands. Due
to package dependency chains, that
order is not something a programmer
normally has precise control over.

The ambiguity can be avoided, it seems,
if each Tcl_Filesystem honors the constraints
that
1) It will not mount on top of an existing directory.
2) It will only mount onto a path whose parent
is an existing directory.



Date: 2004-05-04 21:01
Sender: andreas_kupriesProject Admin

Logged In: YES
user_id=75003

I have come to the conclusion that we have two fundamentally
different pieces of functionality here:

(a) Path rewriting, possibly with side effects.
That is (1) in don's original essay. Example
of true rewriting: Jailing all paths in some
directory, i.e. diverting them into this directory.
Example of side-effect based schemes, where
rewriting is identity, sideeffect can be security
check and denial of access, or logging of the acess.

(b) Correct lookup of the correct filesystem handling a
path. The current linear list does not cut it. Because the
of the inherent tree structure of the filesystem the mount
points are in a tree as well, where deeper mount points have
preference over the one nearer to the root. This can be
handled in a linear list, for simplest and most common cases
by adding the new filesystem to the beginning of the list,
where it intercepts early. however a tree data structure
should be better. We traverse this mount tree as we scan
through the components of the path, and then use the last
mount point we found before traversal became impossible. We
can have back pointers from the mountpoint to the path
objects refering to it and its FS. When mountpoints are
added or removed we know easily for which path objects we
will have to renew the FS association.


Date: 2004-04-26 15:05
Sender: vincentdarley

Logged In: YES
user_id=32170

Vince adds: I view (3) as a bug in the current
implementation (which can be fixed without drastic amounts
of work).

I haven't really thought of (2) as a serious issue before,
but I now see that it is (or at least could be -- one can of
course workaround it by unmounting/operating on the archive
and then remounting). Finally (1) is clearly a design
limitation of the current setup, and one it would be nice to
fix.



Attached File

No Files Currently Attached

Change ( 1 )

Field Old Value Date By
summary path<->FS function limitations 2004-06-03 17:30 dgp