Nicely done.

An interesting thought to add (although probably a digression), is that with the URI indirection provided by aperture://  you can easily write an Aperture URL scheme handler that performs the necessary de-referencing and exposes the file stream data. In this way, one can use  new URL("aperture://my-file-uri") to read from the Aperture virtual file system-like namespace....

As such, you now have a common virtual URI naming scheme that can span multiple distributed or otherwise disparate file systems....allowing the physical management of those files to be managed transparently over time.

On Tue, 2010-07-06 at 20:22 +0200, Antoni Mylka wrote:
Hello All,

I've summarized this discussion on a wiki page:

more comments inline

W dniu 2010-07-06 10:35, Leo Sauermann pisze:
> Hi,
> It was Christiaan Fluit who said at the right time 05.07.2010 17:19 the
> following words:
>> Storing it in the DataSource sounds like the best way to me. Both the
>> FileSystemDataSource and MboxDataSource could use this.
>> Storing both the original and the current prefix is ok - just the
>> current one would also be enough for our purposes though.
> I would really enforce that we have "valid" URIs stored in the data
> we need the originalURI because the code will somehow be like this:
> (I make up the methodnames, too lazy to look into the code now, but you
> get the picture)
> String uri = dataobject.getURI(); // the URI as stored or as crawled
> String uriprefixstored = datasource.getConfiguration().getUriPrefixStored();
> if (!uri.startswith(uriprefixstored)) throw Exception("42! the universe
> ends now");
> Strng strippedUri = uri.substring(uriprefixstore.length());
> String accessibleUri =
> datasource.getConfiguration().getUriPrefixCurrent() + strippedUri.
> ==>  hence, you need both

My proposal was about using an 'aperture://' uri scheme and substituting 
it with 'currentPrefix'. It's the same though, we store two strings in 
the data source and substitute the occurence of one string in the uri 
with another string. Whether we want "valid" uris in RDF is a matter of 
taste. IMHO if we say that we have a file uri


but in fact G is a usb stick, which has been remounted now and is in 
fact Z:, then this URI is not a URL of the file, moreover, there may be 
another usb stick with completely unrelated content which happens to be 
mounted under G: and happens to have those files.

My proposal is similar to Sebastian's, use string ids. Sebastian gets 
the IDs from the USB driver, we could let the user invent them. Since 
they are stored with the DataSource configuration - they are always 
applied to that particular data source and therefore don't even have to 
be unique between data sources (if a user needs to be sure that files 
from different sources get different uris - he/she should take care 
about the uniqueness of the ids by him/herself).

>> The only thing I am still thinking about is the fact that in our system
>> all DataSources will have the same prefix value, i.e. the same value is
>> duplicated a couple of times. Duplication is usually not a good thing,
>> but perhaps it gets too complex if we create a shared storage for this?
> It is duplicated, but on the other hand it is much more stable (=
> self-contained) and readable.

I had this idea too. We could create a global map of id<->urisubstrings, 
place it in ApertureRuntime, propagate it to all registries, via 
constructors, they would propage it to all factories via constructors, 
which would propagate it to all crawlers, accessors and openers. This 
would be quite a lot of work and would somehow feel less "clean" and 
"modular". It's a matter of debate though, do we want ApertureRuntime to 
be a one-stop-shop for all Aperture users and store aperture-wide state 
there, or do we want to keep everything separate as it is now - food for 
thought for Aperture 2.

I'd go for storing the two strings in DataSource and providing a static 
utility class that would perform the conversion appropriately.

> I am conservative and boring here:
> just keep the system "as is" working and keep the URIs "valid", instead
> of relying only on the suffix relative to the configured one.
> But PLEASE - hit this proposal and invalidate it and show us that my
> thinking is wrong and harmful,
> because I don't want to cause technical trouble with this proposal

Once again, it's a matter of taste.



which is better, if the myfile.txt is actually on a Z:/ disk at the moment?

Both approaches would work exactly the same with the ideas I outlined on 
the wiki page.

All kinds of comments welcome.

Antoni Mylka

This email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit --
Aperture-devel mailing list