This is needed for the following case. Let's imagine an .eml file with a single attachment. This attachment is a text.txt.gz file. - a plaintext file compressed with gzip.
When we subcrawl the file, the mime subcrawler will store the name of the attachment in the metadata of the corresponding dataobject. Thus we have a dataobject with an id of 'mime:file:/C:/file.eml!/1', but the metadata contains nfo:fileName "text.txt.gz".
With the latest improvements the GzipSubCrawler can take advantage of the parent metadata, so that the child object is not
which is what we would expect.
Now the catch is visible when we try to use the SubCrawlerUtil.getDataObject to obtain the uncompressed txt file. At the lowest level the gzip subcrawler does not have access to the parent metadata anymore and the only data object it will find is 1.content, so it produces an URI which cannot be found. This is bad. To prevent this, the parentMetadata should also be available when accessing a concrete object.
The request is to add a new method to the SubCrawler interface, which would be identical to getDataObject, but would also allow to pass in a parentMetadata RDFContainer