#105 Pass the parentMetadata to SubCrawler.getDataObject

1.6.0 - features

This is needed for the following case. Let's imagine an .eml file with a single attachment. This attachment is a text.txt.gz file. - a plaintext file compressed with gzip.

When we subcrawl the file, the mime subcrawler will store the name of the attachment in the metadata of the corresponding dataobject. Thus we have a dataobject with an id of 'mime:file:/C:/file.eml!/1', but the metadata contains nfo:fileName "text.txt.gz".

With the latest improvements the GzipSubCrawler can take advantage of the parent metadata, so that the child object is not




which is what we would expect.

Now the catch is visible when we try to use the SubCrawlerUtil.getDataObject to obtain the uncompressed txt file. At the lowest level the gzip subcrawler does not have access to the parent metadata anymore and the only data object it will find is 1.content, so it produces an URI which cannot be found. This is bad. To prevent this, the parentMetadata should also be available when accessing a concrete object.

The request is to add a new method to the SubCrawler interface, which would be identical to getDataObject, but would also allow to pass in a parentMetadata RDFContainer


  • Antoni Mylka

    Antoni Mylka - 2010-09-08
    • milestone: --> 1.6.0 - features
    • assigned_to: nobody --> mylka
    • status: open --> closed
  • Antoni Mylka

    Antoni Mylka - 2010-09-08

    This has been done in revision 2421. The interface has been enhanced, all the classes have been adjusted including AbstractSubCrawler. Some additional tests have been added to test this.

  • Antoni Mylka

    Antoni Mylka - 2010-09-08

    Sorry, it was rev 2427, not 2421. My mistake


Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks