From: Leo S. <leo...@df...> - 2010-07-06 08:35:46
|
Hi, It was Christiaan Fluit who said at the right time 05.07.2010 17:19 the following words: > Storing it in the DataSource sounds like the best way to me. Both the > FileSystemDataSource and MboxDataSource could use this. > > Storing both the original and the current prefix is ok - just the > current one would also be enough for our purposes though. > I would really enforce that we have "valid" URIs stored in the data we need the originalURI because the code will somehow be like this: (I make up the methodnames, too lazy to look into the code now, but you get the picture) String uri = dataobject.getURI(); // the URI as stored or as crawled String uriprefixstored = datasource.getConfiguration().getUriPrefixStored(); if (!uri.startswith(uriprefixstored)) throw Exception("42! the universe ends now"); Strng strippedUri = uri.substring(uriprefixstore.length()); String accessibleUri = datasource.getConfiguration().getUriPrefixCurrent() + strippedUri. ==> hence, you need both > The only thing I am still thinking about is the fact that in our system > all DataSources will have the same prefix value, i.e. the same value is > duplicated a couple of times. Duplication is usually not a good thing, > but perhaps it gets too complex if we create a shared storage for this? > It is duplicated, but on the other hand it is much more stable (= self-contained) and readable. I am conservative and boring here: just keep the system "as is" working and keep the URIs "valid", instead of relying only on the suffix relative to the configured one. But PLEASE - hit this proposal and invalidate it and show us that my thinking is wrong and harmful, because I don't want to cause technical trouble with this proposal best Leo > Chris > > On 05-Jul-10 14:52, Leo Sauermann wrote: > >> Hi Guys, >> >> to come around this.... >> I have seent that we have some good ideas, I copy some abstracts below >> >> Based on that, I propose a simple and working solution: >> * we extend the filesystem datasource properties, currently they contain >> "root URL" which is the root of the filesystem >> >> we have multiple properties: >> * "stored URL prefix" - this is the URL PREFIX stored in all crawled >> data, in the accessdata files, etc. it is usually where the files was >> crawled from and it is the PREFIX of all data as in the database >> * "current URL prefix" - this is the URL PREFIX where the datasource can >> CURRENTLY be accesed >> >> by only storing it in the Datasource, we cover all cases and we cover >> Darren Govoni's philosophical statements also - >> but much better than storing the location as metadata attribute of each >> file, >> we store it once and for all on the datasource level. >> >> moving the store means changing the "current URL prefix" value in the >> datasource config. >> >> the implementation of the difference can be done by the crawlerhandler >> (outside Aperture core) or inside Aperture's core FileSystem handler >> >> problem solved? >> if not, please specify >> >> It was Christiaan Fluit who said at the right time 01.07.2010 13:50 the >> following words: >> >>> The first sounds like overkill to me. The second is the "smart file >>> system crawler" that I talked about in my previous mail. This is >>> basically a next generation FileSystemCrawler and that's certainly not >>> our current goal. >>> >>> We would be happy to take the assumption that the original data hasn't >>> changed at all (or else the end user just has to issue an incremental >>> recrawl), it's only the first part of every location in the DataSource >>> and DataObjects that has changed. >>> >>> >> It was Darren Govoni who said at the right time 01.07.2010 14:30 the >> following words: >> >>> FWIW, I think the "final" location of a file should be considered an >>> external and independent aspect of that file. >>> In the spirit of URL's its not known how or where the physical bits >>> reside. >>> >>> The original location (from whence it came) can always be a metadata >>> attribute. >>> >> >> best >> >> >> > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Sprint > What will you do first with EVO, the first 4G phone? > Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first > _______________________________________________ > Aperture-devel mailing list > Ape...@li... > https://lists.sourceforge.net/lists/listinfo/aperture-devel > -- _____________________________________________________ Dr. Leo Sauermann http://www.dfki.de/~sauermann Deutsches Forschungszentrum fuer Kuenstliche Intelligenz DFKI GmbH Trippstadter Strasse 122 P.O. Box 2080 Fon: +43 6991 gnowsis D-67663 Kaiserslautern Fax: +49 631 20575-102 Germany Mail: leo...@df... Geschaeftsfuehrung: Prof.Dr.Dr.h.c.mult. Wolfgang Wahlster (Vorsitzender) Dr. Walter Olthoff Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes Amtsgericht Kaiserslautern, HRB 2313 _____________________________________________________ |