#52 include part->child links in subcrawler results

open
Antoni Mylka
None
5
2009-07-13
2008-10-08
Antoni Mylka
No

Noticed by Christiaan Fluit:

I just noticed that the metadata of a zip file does not contain references to its child DataObjects, though in case of folders and mails we do add references in the parent DataObject to its files/subfolders and attachments respectively

Discussion

  • Antoni Mylka
    Antoni Mylka
    2008-11-13

    assigned to myself and added to the next-release group

     
  • Antoni Mylka
    Antoni Mylka
    2008-11-13

    • milestone: --> 893322
    • assigned_to: nobody --> mylka
     
  • Antoni Mylka
    Antoni Mylka
    2008-11-13

    The Archiver and Compressor subcrawlers add nfo:belongsToContainer, for the MimeSubCrawler the message itself is placed in the same container as the corresponding file, and it's children are connected with the parent by the DataObjectFactory. The vcard subcrawler places the vcard in the same rdfcontainer as the enclosing file (if there's only one), or interprets the file as a ContactList and adds nco:containsContact links.

    So. All subcrawlers add some sort of links, mostly child-parent, only the vcard subcrawler adds parent->child. Child->Parent links are bad, as established during the discussion on aperture-dev

    http://tinyurl.com/file-crawling-issue

    Are you sure you really need this?

     
  • Antoni Mylka
    Antoni Mylka
    2008-11-14

    I obviously meant parent->child links are bad, because they inflate the parent metadata if there are 100K children. Please elaborate on the use case.

     
  • Yes, I agree that they result in quite some redundant metadata. Reasons to still do it would be:

    - Design consistency. In some cases you have parent->child links, in others you don't, and there is no real logic behind it. It's not only in the VCard subcrawler, also in FileAccessor.
    - One use case for the DataAccessors was outside of a Crawler: someone crawls an object, finds references in it to other objects and then decides which other objects to retrieve. For this, each link between objects should be made at both objects.

    We can simply make the generation of parent->child links optional, like we did in FileAccessor/FileSystemCrawler.

     
  • Antoni Mylka
    Antoni Mylka
    2008-11-26

    This is a test issue comment.

     
  • Antoni Mylka
    Antoni Mylka
    2009-06-10

    This is not a bug but a feature request.

     
  • Antoni Mylka
    Antoni Mylka
    2009-06-10

    • milestone: 893322 --> 893323
     
  • Antoni Mylka
    Antoni Mylka
    2009-07-13

    • milestone: 893323 -->