An interesting thought to add (although probably a digression), is that with the URI indirection provided by aperture:// you can easily write an Aperture URL scheme handler that performs the necessary de-referencing and exposes the file stream data. In this way, one can use new URL("aperture://my-file-uri") to read from the Aperture virtual file system-like namespace....
As such, you now have a common virtual URI naming scheme that can span multiple distributed or otherwise disparate file systems....allowing the physical management of those files to be managed transparently over time.
On Tue, 2010-07-06 at 20:22 +0200, Antoni Mylka wrote:
I've summarized this discussion on a wiki page:
more comments inline
W dniu 2010-07-06 10:35, Leo Sauermann pisze:
> It was Christiaan Fluit who said at the right time 05.07.2010 17:19 the
> following words:
>> Storing it in the DataSource sounds like the best way to me. Both the
>> FileSystemDataSource and MboxDataSource could use this.
>> Storing both the original and the current prefix is ok - just the
>> current one would also be enough for our purposes though.
> I would really enforce that we have "valid" URIs stored in the data
> we need the originalURI because the code will somehow be like this:
> (I make up the methodnames, too lazy to look into the code now, but you
> get the picture)
> String uri = dataobject.getURI(); // the URI as stored or as crawled
> String uriprefixstored = datasource.getConfiguration().getUriPrefixStored();
> if (!uri.startswith(uriprefixstored)) throw Exception("42! the universe
> ends now");
> Strng strippedUri = uri.substring(uriprefixstore.length());
> String accessibleUri =
> datasource.getConfiguration().getUriPrefixCurrent() + strippedUri.
> ==> hence, you need both
My proposal was about using an 'aperture://' uri scheme and substituting
it with 'currentPrefix'. It's the same though, we store two strings in
the data source and substitute the occurence of one string in the uri
with another string. Whether we want "valid" uris in RDF is a matter of
taste. IMHO if we say that we have a file uri
but in fact G is a usb stick, which has been remounted now and is in
fact Z:, then this URI is not a URL of the file, moreover, there may be
another usb stick with completely unrelated content which happens to be
mounted under G: and happens to have those files.
My proposal is similar to Sebastian's, use string ids. Sebastian gets
the IDs from the USB driver, we could let the user invent them. Since
they are stored with the DataSource configuration - they are always
applied to that particular data source and therefore don't even have to
be unique between data sources (if a user needs to be sure that files
from different sources get different uris - he/she should take care
about the uniqueness of the ids by him/herself).
>> The only thing I am still thinking about is the fact that in our system
>> all DataSources will have the same prefix value, i.e. the same value is
>> duplicated a couple of times. Duplication is usually not a good thing,
>> but perhaps it gets too complex if we create a shared storage for this?
> It is duplicated, but on the other hand it is much more stable (=
> self-contained) and readable.
I had this idea too. We could create a global map of id<->urisubstrings,
place it in ApertureRuntime, propagate it to all registries, via
constructors, they would propage it to all factories via constructors,
which would propagate it to all crawlers, accessors and openers. This
would be quite a lot of work and would somehow feel less "clean" and
"modular". It's a matter of debate though, do we want ApertureRuntime to
be a one-stop-shop for all Aperture users and store aperture-wide state
there, or do we want to keep everything separate as it is now - food for
thought for Aperture 2.
I'd go for storing the two strings in DataSource and providing a static
utility class that would perform the conversion appropriately.
> I am conservative and boring here:
> just keep the system "as is" working and keep the URIs "valid", instead
> of relying only on the suffix relative to the configured one.
> But PLEASE - hit this proposal and invalidate it and show us that my
> thinking is wrong and harmful,
> because I don't want to cause technical trouble with this proposal
Once again, it's a matter of taste.
which is better, if the myfile.txt is actually on a Z:/ disk at the moment?
Both approaches would work exactly the same with the ideas I outlined on
the wiki page.
All kinds of comments welcome.
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
Aperture-devel mailing list