From: Antoni M. <ant...@df...> - 2007-06-18 10:57:58
|
I've done some research in the Aperture source code. There are seven crawlers. Each one creates uris in its own way. I divide the constant part from the variable part with |. It is NOT the part of the URI. AppleAddresbook - urn:mac:addresbook:|Antoni%20Mylka (urlencoded name) Thunderbird - urn:thunderbird:Person:|3423424 (person identifier) FileSystem - file://|path/to/the/file (file.toUri().toString()) Ical - file://path/to/cal.ics#|234234-234234-2 (UUID) imap - imap://|us...@ho...:port/folder/id (normal IMAP) web - http:// or file:// outlook - gnowsis://|some/variable/rooturl#UUID It is clear that only files, websites and imap mails follow fixed URI schemes. That's why from three openers (file,http,outlook) only first two are available through the registry. From four accessors (http,file,imap,outlook) only first three are available through a registry. I think it would be a nice idea to standardize it a little, to introduce some URI schemes and stick to them. My ideas are: macaddresbook: thunderbirdaddresbook: - (to distinguish it from mbox:// mbox files) file:// - normal ical://path/to/the/ics/file.ics#uuid... imap:// - normal http:// - normal outlook:UUID or outlook://some/variable/rooturl#UUID This would enable us to write openers and accessors for all seven crawlers. (Theoretically). The question remains if the open(URI uri) method is enough to implement all openers. This would require all uris for 'openable' resources to contain enough information. For the given seven cases it should be possible, though the macaddressbook and thunderbird openers would probably be OS and application specific. All comments welcome. Antoni Mylka ant...@df... |
From: Christiaan F. <chr...@ad...> - 2007-06-18 13:07:13
|
Antoni Mylka wrote: > macaddresbook: > thunderbirdaddresbook: - (to distinguish it from mbox:// mbox files) > file:// - normal > ical://path/to/the/ics/file.ics#uuid... > imap:// - normal > http:// - normal > outlook:UUID or > outlook://some/variable/rooturl#UUID > > This would enable us to write openers and accessors for all seven > crawlers. (Theoretically). Sounds like a good idea! At the moment I'm still trying to figure out for myself which kind of component really is the conceptual "owner" of the URI schemes used. Note for example that WebDataSource and WebCrawler are completely scheme-independent (all scheme-related code is delegated to DataAccessors, it can even crawl a hypertext graph using file: URLs), sometimes it is a mix of Crawler and DataAccessor, e.g. a FileSystemCrawler creates URLs that get copied to identical URIs by the FileAccessor, ImapCrawler implements both Crawler and DataAccessor and knowledge of the URI format is spread across both parts of the implementation. I assume that each opener and accessor will be able to support families of closely-related schemes, right? Note that right now this approach is already used: HttpAccessor supports http: and https:. > The question remains if the open(URI uri) method is enough to implement > all openers. This would require all uris for 'openable' resources to > contain enough information. For the given seven cases it should be > possible, though the macaddressbook and thunderbird openers would > probably be OS and application specific. Counter-example: IMAP and any other format that requires passwords or other types of user credentials that should not be exposed through URIs. Regards, Chris -- |
From: Leo S. <leo...@df...> - 2007-06-21 12:09:07
|
It was Christiaan Fluit who said at the right time 18.06.2007 15:07 the following words: > Antoni Mylka wrote: > >> macaddresbook: >> thunderbirdaddresbook: - (to distinguish it from mbox:// mbox files) >> file:// - normal >> ical://path/to/the/ics/file.ics#uuid... >> imap:// - normal >> http:// - normal >> outlook:UUID or >> outlook://some/variable/rooturl#UUID >> >> This would enable us to write openers and accessors for all seven >> crawlers. (Theoretically). >> > > Sounds like a good idea! > no, if we do that we have to file in a request for registration for every new uri scheme at ICANN. Together with the request, usually an IETF RFC has to be written. thats a lot of work, we should definitly avoid that. If we don't use standardized uris, we get requests from our users "what are these uris, why do they not work, etc...." for e-mails, I see some reason to write an RFC (email:|<messageid>) for the semanticdesktop, we could register an URI scheme at ICAN like this: urn:aperture:.... urn:semdesk:... and then do things like: urn:semdesk:macaddressbook: urn:semdesk:outlook: best Leo > At the moment I'm still trying to figure out for myself which kind of > component really is the conceptual "owner" of the URI schemes used. Note > for example that WebDataSource and WebCrawler are completely > scheme-independent (all scheme-related code is delegated to > DataAccessors, it can even crawl a hypertext graph using file: URLs), > sometimes it is a mix of Crawler and DataAccessor, e.g. a > FileSystemCrawler creates URLs that get copied to identical URIs by the > FileAccessor, ImapCrawler implements both Crawler and DataAccessor and > knowledge of the URI format is spread across both parts of the > implementation. > > I assume that each opener and accessor will be able to support families > of closely-related schemes, right? Note that right now this approach is > already used: HttpAccessor supports http: and https:. > > >> The question remains if the open(URI uri) method is enough to implement >> all openers. This would require all uris for 'openable' resources to >> contain enough information. For the given seven cases it should be >> possible, though the macaddressbook and thunderbird openers would >> probably be OS and application specific. >> > > Counter-example: IMAP and any other format that requires passwords or > other types of user credentials that should not be exposed through URIs. > > > Regards, > > Chris > -- > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Aperture-devel mailing list > Ape...@li... > https://lists.sourceforge.net/lists/listinfo/aperture-devel > -- ____________________________________________________ DI Leo Sauermann http://www.dfki.de/~sauermann Deutsches Forschungszentrum fuer Kuenstliche Intelligenz DFKI GmbH Trippstadter Strasse 122 P.O. Box 2080 Fon: +49 631 20575-116 D-67663 Kaiserslautern Fax: +49 631 20575-102 Germany Mail: leo...@df... Geschaeftsfuehrung: Prof.Dr.Dr.h.c.mult. Wolfgang Wahlster (Vorsitzender) Dr. Walter Olthoff Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes Amtsgericht Kaiserslautern, HRB 2313 ____________________________________________________ |
From: Antoni M. <ant...@df...> - 2007-06-21 12:56:10
|
Leo Sauermann pisze: > no, if we do that we have to file in a request for registration for > every new uri scheme at ICANN. > Together with the request, usually an IETF RFC has to be written. > > thats a lot of work, we should definitly avoid that. > > If we don't use standardized uris, we get requests from our users "what > are these uris, why do they not work, etc...." > > for e-mails, I see some reason to write an RFC (email:|<messageid>) Such a URI doesn't contain any information how to access such a resource. An email could be in Outlook, or on an IMAP mailbox, or any other place. It would be impossible to write an accessor for that scheme. From Aperture implementation point of view a more useful one would be something like urn:semdesk:outlookemail:messageID or urn:semdesk:outlook:OutlookItemUUID (even easier) A remark: It turns out that there may be some misconception when talking about accessors and openers for URI schemes. They are in fact for URL schemes. > for the semanticdesktop, we could register an URI scheme at ICAN like this: > urn:aperture:.... > urn:semdesk:... > > and then do things like: > urn:semdesk:macaddressbook: > urn:semdesk:outlook: Very well, but then if we want to have an accessor that would work with urn:semdesk:macaddressbook, we have two options: 1. redefine the concept of an URI scheme: The scheme name consist of a letter followed by any combination of letters, digits, and the plus ("+"), period ("."), or hyphen ("-") characters; and is terminated by a colon (":"). (from Wikipedia). 2. change the definition of the accessor registry, so that new factories are not registered with uri schemes but with arbitrary URI prefixes. The second one seems the only possibility to me. This issue could be solved when the content and representation have two URIS, (a representation URL (urn:semdesk:outlook:outlookUUID), for which we could have an accessor and an opener and the content URI which could be email:messageId) but this would bring milion other issues and would turn the Aperture architecture upside down... :) It would be a nice thing to think about though if anyone ever wants to write Aperture 2.0. Until then there are three options: 1. Use URLs for everything we want to be available with an Accessor or an Opener. (and register factories with arbitrary prefixes, not with schemes) 2. Introduce some coupling between data sources, accessors and openers. Something along the lines I wrote some time ago. So that when we have an email:messageId we can trace it to a particular mbox file and call a ThunderbirdOpener. 3. Leave it as it is and accept the fact the way URIs are created limits the growth of Aperture. Antoni Mylka ant...@df... |