#52 include part->child links in subcrawler results


Noticed by Christiaan Fluit:

I just noticed that the metadata of a zip file does not contain references to its child DataObjects, though in case of folders and mails we do add references in the parent DataObject to its files/subfolders and attachments respectively


  • Antoni Mylka

    Antoni Mylka - 2008-11-13

    assigned to myself and added to the next-release group

  • Antoni Mylka

    Antoni Mylka - 2008-11-13
    • milestone: --> 893322
    • assigned_to: nobody --> mylka
  • Antoni Mylka

    Antoni Mylka - 2008-11-13

    The Archiver and Compressor subcrawlers add nfo:belongsToContainer, for the MimeSubCrawler the message itself is placed in the same container as the corresponding file, and it's children are connected with the parent by the DataObjectFactory. The vcard subcrawler places the vcard in the same rdfcontainer as the enclosing file (if there's only one), or interprets the file as a ContactList and adds nco:containsContact links.

    So. All subcrawlers add some sort of links, mostly child-parent, only the vcard subcrawler adds parent->child. Child->Parent links are bad, as established during the discussion on aperture-dev


    Are you sure you really need this?

  • Antoni Mylka

    Antoni Mylka - 2008-11-14

    I obviously meant parent->child links are bad, because they inflate the parent metadata if there are 100K children. Please elaborate on the use case.

  • Christiaan Fluit

    Yes, I agree that they result in quite some redundant metadata. Reasons to still do it would be:

    - Design consistency. In some cases you have parent->child links, in others you don't, and there is no real logic behind it. It's not only in the VCard subcrawler, also in FileAccessor.
    - One use case for the DataAccessors was outside of a Crawler: someone crawls an object, finds references in it to other objects and then decides which other objects to retrieve. For this, each link between objects should be made at both objects.

    We can simply make the generation of parent->child links optional, like we did in FileAccessor/FileSystemCrawler.

  • Antoni Mylka

    Antoni Mylka - 2008-11-26

    This is a test issue comment.

  • Antoni Mylka

    Antoni Mylka - 2009-06-10

    This is not a bug but a feature request.

  • Antoni Mylka

    Antoni Mylka - 2009-06-10
    • milestone: 893322 --> 893323
  • Antoni Mylka

    Antoni Mylka - 2009-07-13
    • milestone: 893323 -->

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks