FWIW, I think the "final" location of a file should be considered an external and independent aspect of that file.
In the spirit of URL's its not known how or where the physical bits reside.
The original location (from whence it came) can always be a metadata attribute.
Whether or not its considered a different file by virtue of that or not is too philosophical for most in my opinion.
The bits of the file remain the same. Is it the same? Descarte would ask...
However, our use case involves crawling files on temporary media, archiving them to long term storage for future retrieval. We don't
continuously crawl over time. In this respect, perhaps its different than how others use aperture.
On Thu, 2010-07-01 at 13:50 +0200, Christiaan Fluit wrote:
Christian Reuschling wrote:
> we also have this issue in the case of demonstration situations - or if we get
> documents from our customers, processing them ourselves and giving them back as
> bundle together with the searching application.
> Our current solution makes a postprocessing step directly on the final datastore
> (a Lucene index in our case). There, for all index document URLs, we simply
> replace the current working dir (which is adjustable by a system property) with
> '.'. Thus, we have to copy all documents that should be avaliable under a
> relative path into a directory which lies under the current working directory.
Looks similar to our use case. In our case the end user has moved the
indexed data or accesses it through a different route than before and
now the app has to modify its view on the data accordingly.
> This is of course a very special workaround, so we also would love to have
> relative URLs directly supported by Aperture.
> According to support location changes - I'm not sure if this is so easy to
> do. Currently, aperture works with the paradigm that every file is unique.
> In the case you have a copy of a file at some other place at the hard
> disc, this copy will be handled separately from the other file.
> This also means that changing the location of a file can not be interpreted
> clearly. Is the file at the target location a new file, which is a copy of the
> one at the old location? Or is it the same file, but with a new location?
> To differentiate these situations, I currently can think about two possibilities:
> 1. We change the paradigm and interpret every 'copy' as the same document
> entity, with several manifestations on the hard disc. Maybe this can be
> achieved by generated fingerprints - I remember that Antoni deals with MD5 sums
> by processing emails. But this is hard to support automatically - e.g. opening
> and immediately closing a word file can yield to this confusing question 'save
> the changes?'. In the case a user will click on 'yes', thus changing some bits
> in the file - it becomes different again. Maybe we can work on the extraced
> fulltext level as a solution...but changing such a base paradigm is still not
> 2. We have native filesystem listeners, giving us the possibiliy to
> distinguish between a copy and a move operation. I don't know what the current
> status on open source listeners is - but they will be non-plattform-independent,
> and only working on file systems.
> Both scenarios would give us nice benefits - but both also leads to a huge
> amount of work, so I'm not sure if this is not an overkill for just having
> relative URLs. Maybe I also see it too complicated, and someone have a nice
> idea/trick how to achieve location changes much more easily?
The first sounds like overkill to me. The second is the "smart file
system crawler" that I talked about in my previous mail. This is
basically a next generation FileSystemCrawler and that's certainly not
our current goal.
We would be happy to take the assumption that the original data hasn't
changed at all (or else the end user just has to issue an incremental
recrawl), it's only the first part of every location in the DataSource
and DataObjects that has changed.
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
Aperture-devel mailing list