#14 move to semdesk uri scheme

general (27)

We need to implement the semdesk hack that we had in mind in august.

The URI scheme is described here:


I changed the scheme to use domain names as application identifiers - comments?

changes need to be done to Outlook first, other things can wait.

At the end, this won't take long, we just have to hardcode one string somewhere. "semdesk:"


  • Leo Sauermann

    Leo Sauermann - 2007-10-12
    • summary: move --> move to semdesk uri scheme
  • Christiaan Fluit

    • priority: 6 --> 9
  • Antoni Mylka

    Antoni Mylka - 2007-10-31

    Logged In: YES
    Originator: NO

    No it's not. This will invalidate the OutlookAccessorFactory and OutlookOpenerFactory, because there will be no 'outlook' URI scheme anymore. This will have direct consequenses for Nepomuk Users, who need the (access|refresh|open)Resource methods to work for Outlook resources.

    See the Nepomuk ticket no 78
    OlafGrebner kept bugging me about it.

    The semdesk hack will necessitate a rewrite of aperture registries, so that they won't work with uri SCHEMES (i.e. the part before the colon) but with arbitrary uri PREFIXES (which may reach only to the first colon, but may also go further (semdesk://localhost/outlook)). I would like to hear some more comments before doing this.

  • Antoni Mylka

    Antoni Mylka - 2007-11-02

    Logged In: YES
    Originator: NO

    Copied this comment from issue number [ 1824544 ] Convert website crawlers to use NAO since it should belong here.

    >Comment By: Leo Sauermann (leo_sauermann)
    Date: 2007-11-02 13:02

    Logged In: YES
    Originator: NO

    we need it for desktop:
    * desktop://outlook
    * desktop://email
    * desktop://thunderbird

    the main goal is to minimize the non-standard behavior we have, making up
    multiple imaginary desktop protocols is unhealthy. one imaginary protocol
    "desktop" has the chance to be registered sometimes.

    we also need it for web 2.0 applications:
    * http://www.flickr.com/photos/\*

    We could write a special DataAccessor that reacts to these kind of online
    uris, invoking a proper data extraction using the web apis or page scraping
    instead of just extracting the plaintext from the HTML.

    Still, there is a problem of passing in the datasource, once a
    DataAccessor is retrieved using the DataAccessorRegistry.get(String scheme)
    method, I cannot pass the right datasource to
    DataAccessor.getDataObject(String url, DataSource source, Map params,
    RDFContainerFactory containerFactory) throws UrlNotFoundException,

    The Outlook, Thunderbird, Flickr, etc DataAccessors would need the correct
    DataSource to know the password or other configuration options that are not
    encoded in the URI.
    For this, I would again urge to add a "canContainURI(String uri)" method
    to the DataSource interface, that returns true (or these 0-200 values)
    indicating if it can contain a URI. we had this discussion before - where
    is it documented?

    problem is that some "schemes" can be part of a "prefix" also, I would add
    both functionalities to the dataaccessors to be on the save side, and
    perhaps also add the possibility of regular expressions for uri parsing (to
    say "yes" both to http and https versions, etc).

  • Antoni Mylka

    Antoni Mylka - 2007-11-02

    Logged In: YES
    Originator: NO

    We've had a little chat with Leo on Skype today. The constraints we are trying to satisfy are:

    1. Keep the current schemes intact, file://, http://, https:// should work as expected, the same would apply to ftp://, webdav://, gopher etc.
    2. The schemes that aren't stadardized (outlook:// etc.) should be substituted with the semdesk:// scheme with appropriate prefixes. The accessor, and opener registries should be adjusted.
    3. It should be possible to distinguish between a generic accessor (http) and a specific accessor (flickr).

    We agreed that the basic prefix-processing is too little. It wouldn't cover the http vs. https case etc. DataSource implementations must be allowed to do more advanced URI parsing. That's why it's better to have a canContainUri method.

    Solution 1
    1. public int DataSource.canContainUri - the implementation may or may not take the information in the data source configuration into account. The number returned is the number of the characters in the URI that make up the necessary condition. For FileSystemDataSource we have startsWith(rootFolder), for Web we have startsWith rootUrl, or fallsWithin domain boundaries. It is impossible to set up a mathematically sound contract for this method and the value should be used as a HINT as to whether the data source CAN contain a resource. No guarantees are given. If the URI violates some restriction (e.g. it is outside of the domain boundaries), a negative value is returned.
    2. public DataAccessorFactory DataSource.getDataAccessorFactory
    3. public DataOpenerFactory DataSource.getDataOpenerFactory

    Solution 2
    1. public int DataSource.canContainUri - as above
    2. public int DataAccessorFactory.canContainUri - unclear, only most basic regex processing is permissible since the dataAccessorFactory doesn't have access to the data source configuration
    3. public int DataOpenerFactory.canContainUri - unclear, only most basic regex processing is permissible since the dataAccessorFactory doesn't have access to the data source configuration
    Advantage: theoretically even without a data source a semdesk://localhost/outlook uri may b e recognized by an OutlookAccessorFactory
    Disadvantage: theoretically cases may occur when an URI is recognized by a DataSource, but is not recognized by any DataAccessorFactory, due to some source-specific parts of the URI. Can they?

    I would personally go for the second solution since I can't come up with any meaningful examples when problems might occur. Does anyone have a better idea?

  • Leo Sauermann

    Leo Sauermann - 2007-11-06

    Logged In: YES
    Originator: YES

    btw: in the august discussion on uris schemes at ontology meeting in KArlsruhe, we agreed to use desktop:// rather than semdesk://, as this is generic on the desktop level.

    for solution 1 I would rather do direct opener/accessor, its an impl returning an impl:
    2. public DataAccessorFactory DataSource.getDataAccessor()
    3. public DataOpenerFactory DataSource.getDataOpener()

    for solution 2, the hope is that accessors/openers work for desktop when having registered prefixes.
    I would rather go here for semi-standardized schemes, like kde://, where the usual scheme-based registry works.

    my main fear are complex datasources where the dataaccessor/opener needs to know about the datasource. Such as flickr. They may be mapped to HTTP uris using ontologies, and then accessor are completly not-standard, (opener is) and may need the datasource to work (flickr needs api key, perhaps username).
    also, multiple email sources may have different ways of accessing the resources.

  • Antoni Mylka

    Antoni Mylka - 2009-11-05

    I reassign this to Leo, I don't have Outlook and can't test the change.

  • Antoni Mylka

    Antoni Mylka - 2009-11-05
    • assigned_to: mylka --> leo_sauermann
  • Leo Sauermann

    Leo Sauermann - 2009-11-05



Log in to post a comment.