From: Antoni M. <ant...@gm...> - 2010-07-06 18:22:42
|
Hello All, I've summarized this discussion on a wiki page: https://sourceforge.net/apps/trac/aperture/wiki/SupportForMovingDataObjects more comments inline W dniu 2010-07-06 10:35, Leo Sauermann pisze: > Hi, > > It was Christiaan Fluit who said at the right time 05.07.2010 17:19 the > following words: >> Storing it in the DataSource sounds like the best way to me. Both the >> FileSystemDataSource and MboxDataSource could use this. >> >> Storing both the original and the current prefix is ok - just the >> current one would also be enough for our purposes though. >> > I would really enforce that we have "valid" URIs stored in the data > > we need the originalURI because the code will somehow be like this: > (I make up the methodnames, too lazy to look into the code now, but you > get the picture) > > String uri = dataobject.getURI(); // the URI as stored or as crawled > String uriprefixstored = datasource.getConfiguration().getUriPrefixStored(); > if (!uri.startswith(uriprefixstored)) throw Exception("42! the universe > ends now"); > Strng strippedUri = uri.substring(uriprefixstore.length()); > String accessibleUri = > datasource.getConfiguration().getUriPrefixCurrent() + strippedUri. > > ==> hence, you need both My proposal was about using an 'aperture://' uri scheme and substituting it with 'currentPrefix'. It's the same though, we store two strings in the data source and substitute the occurence of one string in the uri with another string. Whether we want "valid" uris in RDF is a matter of taste. IMHO if we say that we have a file uri 'file:///G:/myfolder/myfile.txt' but in fact G is a usb stick, which has been remounted now and is in fact Z:, then this URI is not a URL of the file, moreover, there may be another usb stick with completely unrelated content which happens to be mounted under G: and happens to have those files. My proposal is similar to Sebastian's, use string ids. Sebastian gets the IDs from the USB driver, we could let the user invent them. Since they are stored with the DataSource configuration - they are always applied to that particular data source and therefore don't even have to be unique between data sources (if a user needs to be sure that files from different sources get different uris - he/she should take care about the uniqueness of the ids by him/herself). >> The only thing I am still thinking about is the fact that in our system >> all DataSources will have the same prefix value, i.e. the same value is >> duplicated a couple of times. Duplication is usually not a good thing, >> but perhaps it gets too complex if we create a shared storage for this? >> > It is duplicated, but on the other hand it is much more stable (= > self-contained) and readable. I had this idea too. We could create a global map of id<->urisubstrings, place it in ApertureRuntime, propagate it to all registries, via constructors, they would propage it to all factories via constructors, which would propagate it to all crawlers, accessors and openers. This would be quite a lot of work and would somehow feel less "clean" and "modular". It's a matter of debate though, do we want ApertureRuntime to be a one-stop-shop for all Aperture users and store aperture-wide state there, or do we want to keep everything separate as it is now - food for thought for Aperture 2. I'd go for storing the two strings in DataSource and providing a static utility class that would perform the conversion appropriately. > I am conservative and boring here: > just keep the system "as is" working and keep the URIs "valid", instead > of relying only on the suffix relative to the configured one. > > But PLEASE - hit this proposal and invalidate it and show us that my > thinking is wrong and harmful, > because I don't want to cause technical trouble with this proposal Once again, it's a matter of taste. file://G:/myfolder/myfile.txt aperture://thumbdrive/myfolder/myfile.txt which is better, if the myfile.txt is actually on a Z:/ disk at the moment? Both approaches would work exactly the same with the ideas I outlined on the wiki page. All kinds of comments welcome. Antoni Mylka ant...@gm... |