It was Antoni Mylka who said at the right time 30.07.2008 20:02 the following words:

OK. We adopt the VFS URI convention.

<type> - zip,tar,gzip,vcard etc.
<parentobject> - URI of the parent object
<internalpath> - the path within the parent object, (or a vcard
identifier, or any other string that identifies the sub-object).
out of sheer curiosity I just checked with the URI syntax (rfc3986), these are ok, "!" is a sub-delimiter, there are others:
      gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

      sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

For this, we need to
 - modify the current subcrawler implementations.
 - extend the subcrawler with an accessor-like method like
subCrawlSingleObject, it would have the same arguments as subcrawler
plus the internal identifier, it would return the subobject
 - each subcrawlerfactory is registered with a mime type AND the type
identifier for the uris. Those 'type identifier' are the same as in
VFS. For subcrawlers that don't occur in VFS we invent the type
identifiers (simpliy 'vcard') and we maintain that list
 - create a utility class

public class SubCrawlerUriUtil {
    public URI getParentObjectUri()
    public List<String[]> getSubObjectIds()

the second method returns a list of two-element string arrays. The
first element of each array is the type identifier ('tar', 'zip'
etc.), the second element is the internal identifier (like a path in
the archive). A second utility class does the accessing stuff. We
don't need t consult the MimeTypeIdentifier in the process because of
the type ids embedded within the URI itself.

sounds fine. To wrap my head around this, an URI would then look like this:


To be safe, the <parentobject> and <internalpath> parts should be
checked for "!" and if one is found, replaced with %21 (hex-encoded ASCII "!")

Chris, if you're OK with it, please create an SF issue.

Leo? Others? What do you think?
I didn't review it completly because of time constraints (nepomuk)
but I see that we need to have good URIs...
and if VFS has a standard that works for our problem, we should use it.


All kinds of comments welcome.


DI Leo Sauermann       http://www.dfki.de/~sauermann 

Deutsches Forschungszentrum fuer 
Kuenstliche Intelligenz DFKI GmbH
Trippstadter Strasse 122
P.O. Box 2080           Fon:   +49 631 20575-116
D-67663 Kaiserslautern  Fax:   +49 631 20575-102
Germany                 Mail:  leo.sauermann@dfki.de

Prof.Dr.Dr.h.c.mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313