Aperture / Feature Requests / #14 move to semdesk uri scheme

Leo Sauermann - 2007-10-12

summary: move --> move to semdesk uri scheme
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christiaan Fluit - 2007-10-15

priority: 6 --> 9
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Antoni Mylka - 2007-10-31

Logged In: YES
user_id=1613065
Originator: NO

No it's not. This will invalidate the OutlookAccessorFactory and OutlookOpenerFactory, because there will be no 'outlook' URI scheme anymore. This will have direct consequenses for Nepomuk Users, who need the (access|refresh|open)Resource methods to work for Outlook resources.

See the Nepomuk ticket no 78
http://dev.nepomuk.semanticdesktop.org/ticket/78
OlafGrebner kept bugging me about it.

The semdesk hack will necessitate a rewrite of aperture registries, so that they won't work with uri SCHEMES (i.e. the part before the colon) but with arbitrary uri PREFIXES (which may reach only to the first colon, but may also go further (semdesk://localhost/outlook)). I would like to hear some more comments before doing this.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Antoni Mylka - 2007-11-02

Logged In: YES
user_id=1613065
Originator: NO

Copied this comment from issue number [ 1824544 ] Convert website crawlers to use NAO since it should belong here.

>Comment By: Leo Sauermann (leo_sauermann)
Date: 2007-11-02 13:02

Message:
Logged In: YES
user_id=1242018
Originator: NO

we need it for desktop:
* desktop://outlook
* desktop://email
* desktop://thunderbird

the main goal is to minimize the non-standard behavior we have, making up
multiple imaginary desktop protocols is unhealthy. one imaginary protocol
"desktop" has the chance to be registered sometimes.

we also need it for web 2.0 applications:
* http://www.flickr.com/photos/\*

We could write a special DataAccessor that reacts to these kind of online
uris, invoking a proper data extraction using the web apis or page scraping
instead of just extracting the plaintext from the HTML.

Still, there is a problem of passing in the datasource, once a
DataAccessor is retrieved using the DataAccessorRegistry.get(String scheme)
method, I cannot pass the right datasource to
DataAccessor.getDataObject(String url, DataSource source, Map params,
RDFContainerFactory containerFactory) throws UrlNotFoundException,
IOException;

The Outlook, Thunderbird, Flickr, etc DataAccessors would need the correct
DataSource to know the password or other configuration options that are not
encoded in the URI.
For this, I would again urge to add a "canContainURI(String uri)" method
to the DataSource interface, that returns true (or these 0-200 values)
indicating if it can contain a URI. we had this discussion before - where
is it documented?

problem is that some "schemes" can be part of a "prefix" also, I would add
both functionalities to the dataaccessors to be on the save side, and
perhaps also add the possibility of regular expressions for uri parsing (to
say "yes" both to http and https versions, etc).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Antoni Mylka - 2007-11-02

Logged In: YES
user_id=1613065
Originator: NO

We've had a little chat with Leo on Skype today. The constraints we are trying to satisfy are:

1. Keep the current schemes intact, file://, http://, https:// should work as expected, the same would apply to ftp://, webdav://, gopher etc.
2. The schemes that aren't stadardized (outlook:// etc.) should be substituted with the semdesk:// scheme with appropriate prefixes. The accessor, and opener registries should be adjusted.
3. It should be possible to distinguish between a generic accessor (http) and a specific accessor (flickr).

We agreed that the basic prefix-processing is too little. It wouldn't cover the http vs. https case etc. DataSource implementations must be allowed to do more advanced URI parsing. That's why it's better to have a canContainUri method.

Solution 1
1. public int DataSource.canContainUri - the implementation may or may not take the information in the data source configuration into account. The number returned is the number of the characters in the URI that make up the necessary condition. For FileSystemDataSource we have startsWith(rootFolder), for Web we have startsWith rootUrl, or fallsWithin domain boundaries. It is impossible to set up a mathematically sound contract for this method and the value should be used as a HINT as to whether the data source CAN contain a resource. No guarantees are given. If the URI violates some restriction (e.g. it is outside of the domain boundaries), a negative value is returned.
2. public DataAccessorFactory DataSource.getDataAccessorFactory
3. public DataOpenerFactory DataSource.getDataOpenerFactory

Solution 2
1. public int DataSource.canContainUri - as above
2. public int DataAccessorFactory.canContainUri - unclear, only most basic regex processing is permissible since the dataAccessorFactory doesn't have access to the data source configuration
3. public int DataOpenerFactory.canContainUri - unclear, only most basic regex processing is permissible since the dataAccessorFactory doesn't have access to the data source configuration
Advantage: theoretically even without a data source a semdesk://localhost/outlook uri may b e recognized by an OutlookAccessorFactory
Disadvantage: theoretically cases may occur when an URI is recognized by a DataSource, but is not recognized by any DataAccessorFactory, due to some source-specific parts of the URI. Can they?

I would personally go for the second solution since I can't come up with any meaningful examples when problems might occur. Does anyone have a better idea?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Leo Sauermann - 2007-11-06

Logged In: YES
user_id=1242018
Originator: YES

btw: in the august discussion on uris schemes at ontology meeting in KArlsruhe, we agreed to use desktop:// rather than semdesk://, as this is generic on the desktop level.

for solution 1 I would rather do direct opener/accessor, its an impl returning an impl:
2. public DataAccessorFactory DataSource.getDataAccessor()
3. public DataOpenerFactory DataSource.getDataOpener()

for solution 2, the hope is that accessors/openers work for desktop when having registered prefixes.
I would rather go here for semi-standardized schemes, like kde://, where the usual scheme-based registry works.

my main fear are complex datasources where the dataaccessor/opener needs to know about the datasource. Such as flickr. They may be mapped to HTTP uris using ontologies, and then accessor are completly not-standard, (opener is) and may need the datasource to work (flickr needs api key, perhaps username).
also, multiple email sources may have different ways of accessing the resources.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Antoni Mylka - 2009-11-05

I reassign this to Leo, I don't have Outlook and can't test the change.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Antoni Mylka - 2009-11-05

assigned_to: mylka --> leo_sauermann
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Leo Sauermann - 2009-11-05

ok

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

move to semdesk uri scheme

Group

Searches

Help

#14 move to semdesk uri scheme

Discussion