Menu

Milan_1412_2011

Tomas Knap

Important Suggestions

  • How to deal with metadata in SPARQL? Create GUI which will enable user to annotated the given SPARQL query - the annotatios (select source XY, select arbitrary source, compute average) will be translated to the SPARQL query submitted to the ODCleanStore (we will still use the same syntactical and semantic rules of the SPARQL)
  • A tool for Mapping of ontologies: http://agreementmaker.org/ (Pozadal jsem o link)
  • Do prototype as soon as possible
  • Hold authorities in the given domain. Incorporate authorities when running conflict resolution box (currently, we compute score for the dataset, so more-or-less doing that). If the given named graph comes from an authority in the given domain, it could have 100% accuracy, 100% timeliness by law (tak to je otazka).
  • Conflict resolution box should be customized not just by adminstrator, consumer, but also by policy creator (the rules can be domain specific, hard to be specified by the administrator)
  • Add customizable consumer's policies (Trust/prefer/not prefer/distrust publisher "example.org" influencing the consumed data
  • Error Localization - Completness of the dataset A can be described as (#triples about the concept C in dataset A) / (#all distinct triples about the concept C known by storage); incorporate timeliness? (take into account volatility of the value!)

Other suggestions

  • Focus also on DQ of the scraper (scraping module)
  • Use OKKAM service for getting alternative IDs and registering current IDs (https://docs.google.com/View?id=df96pvn2_59htpsxmcr)
  • Create links pointing to the data
  • Sampling - use just a subset of data to estimate its quality
  • Machine learning - derive rules from the given set of named graphs, derive volatility of data (to support score for timeliness)
  • Data correction - be careful of functional dependencies
  • External Service for entity resolution (anybody can send named graph about the particular concept, we provide URI he/she should use)

  • Influence of license in the process of conflict resolution

  • Corrector as a role?

Zdroje

  • JM - VLDB 2010 tutorial na data integration (dodam link)
  • Weawing Pedantic Web (http://pedantic-web.org/)

O jejich tymu:

  • Anisa Rula - timeliness of data
  • Pei Li - record linkage
  • their supervisor Maurino Andrea - record linkage
  • Palmonari Matteo - ontology mapping
  • Carlo Batini

Other notes:

  • Open for internships, common European projects