Menu

Testing scenario 0.3.4

Tomas Knap

Testing scenario for release 0.3.4

Testováno nad částí dat z ISVZUSu, připraveno na prezentaci.


Data sets needed

ISVZUS (sample 2x)

Scenario

Zakladni kroky

  • ukazat vstupni data A (nejake business entities, nejake contracts), co tam je za data, co jim chybi, mely by tam byt chyby (aspon jedna, dve) pro DN, 1-2 QA pravidla by mela selhat.
  • ukazat jak vypadaji pravidla pro DN, vysvetlit templates, OI, QA (a jak to adresuje ty nedostatky - jen nektere)
  • create pipeline, create normalizer, QA, linker (to je bude zajimat asi nejvic, trebas linkovani sameAs mezi be), vysvetlit, ze nepotrebuju blank node removal
    • assignovat predpripravenou skupinu DN pravidel, OI pravidlo a QA pravidel
    • posunout linker pred QA
    • vysvetlit nastaveni u pipeline (run on clean db, default, ...)
  • ukazat pres simple webovou stranku pro agregace, ze tam zadna data o business entite X ze vstupniho souboru nejsou (napriklad)
  • pustit pipeline, rychle rict co se stalo, ukazat, ze tam data jsou, ukazat jak se hodnoty upravili (napr. diky cleaneru) napr. zas pres tu webovou stranku pro agregace.

  • ukazat vstupni data B, ukazat syntaktickou chybu v dokumentu
    • zadne QA rule neselze (tedy kvalitni data az na syntaktickou chybu)
    • data B by mela obsahovat nektere sameAs business entitites jako A (at vzniknou linky) a mame pak conflicts a ruznou quality pro data o business entities z A a B
  • pustit pipeline a predhodit ji data B, ukazat, ze je neco v error graph
  • opravit syntaktickou chybu, pustit znovu, ukazat vysledek pres agregacni service
  • pohrat si s ruznymi agregacemi

  • managing accounts, assigning roles (jen prulet)
  • probehnout v rychlosti zbylou funkcionalitu (jen ukazat a slovne popsat), pripadne zkusit predvest, viz dale:

Dale (co funguje/stihnes, kdyztak trebas jen zmin na slidech)

  • zmenit pravidlo (asi linkovaci), pipeline sebehne znova, ukazat zmenu pres data aggregation browser nebo sparql
  • listing ontologies, creating mappings, jak se mapping projevi pri data aggregation?

Set up pipeline

URL Prefixes

Label properties

Pipelines

  • create pipeline labeled "test"
  • assign DN transformer to pipeline "test"
    • WD: transformers-working-dir/dn
    • Allow on clean DB: ?
    • priority: 1

DN rules

  • create a DN rule group "dn-test"
  • add a new rule
    • Description: Convert gr:hasCurrencyValue to xsd:float
  • add rule component
    • Type: INSERT
    • Description: insert typed literal
    • Modification:

      { ?s ?p xsd:float(bif:replace(bif:replace(?o, ' ', ''), ',', '.')) }
      WHERE {
      ?s ?p ?o.
      FILTER (?p = <http://purl.org/goodrelations/v1#hasCurrencyValue>)
      FILTER REGEX(?o, '^[0-9][0-9 ][.,]?[0-9]$')
      }
  • add rule component

    • Type: DELETE
    • Description: remove old converted literals
    • Modification:

      { ?s ?p ?o }
      WHERE {
      ?s ?p ?o.
      FILTER (?p = <http://purl.org/goodrelations/v1#hasCurrencyValue>)
      FILTER REGEX(?o, '^[0-9][0-9 ][.,]?[0-9]$')
      FILTER (datatype(?o) != xsd:float)
      }
  • add a new rule

    • Description: Convert br:officialNumber to xsd:integer
  • add rule component
    • Type: INSERT
    • Description: insert typed literal
    • Modification:

      { ?s ?p xsd:integer(?o) }
      WHERE {
      ?s ?p ?o.
      FILTER(?p = <http://purl.org/business-register#officialNumber>)
      FILTER (!bif:isnull(xsd:integer(?o)))
      }
  • add rule component
    • Type: DELETE
    • Description: remove old converted literals
    • Modification:

      { ?s ?p ?o }
      WHERE {
      ?s ?p ?o.
      FILTER(?p = <http://purl.org/business-register#officialNumber>)
      FILTER (!bif:isnull(xsd:integer(?o)) && datatype(?o) != xsd:integer)
      }

Pipelines

  • assign "dn-test" group to the DN transformer instance in "test" pipeline
  • assign QA transformer to pipeline "test"
    • WD: transformers-working-dir/qa
    • Allow on clean DB: yes
    • priority: 2

QA rules

  • create a QA rule group "qa-test"
  • add a new rule
    • Coefficient: 0.9
    • Description: Invalid email address.
    • Filter:
      { ?s vcard2006:email ?mail. FILTER(!regex(?mail, "^[A-Z0-9._%-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}$", "i")) }
  • add a new rule
    • Coefficient: 0.9
    • Description: Publication date after tender deadline.
    • Filter:

      {?s <http://purl.org/procurement/public-contracts#publicationDate> ?p.
      ?s <http://purl.org/procurement/public-contracts#tenderDeadline> ?d.
      FILTER (bif:datediff('day', xsd:date(?p), xsd:date(?d)) < 1)}
  • add a new rule
    • Coefficient: 0.5
    • Description: Invalid gr:hasCurrencyValue price.
    • Filter:
      { ?s gr:hasCurrencyValue ?price. FILTER(!regex(?price, "^[0-9 ]*([.,][0-9]+)?$", "i")) }

Pipelines

  • assign "qa-test" group to the QA transformer instance in "test" pipeline
  • assign Linker to pipeline "test"
    • WD: transformers-working-dir/oi
    • Allow on clean DB: yes
    • priority: 3

OI rules

  • create an OI rule group "oi-test"
  • add a new rule
    • Label: BE-by-number
    • Link type: owl:sameAs
    • Source restriction: ?a rdf:type gr:BusinessEntity
    • Target restriction: ?b rdf:type gr:BusinessEntity
    • Linkage rule:

      <LinkageRule>
      <Compare weight="1" threshold="1.0" required="true" metric="equality">
      <Input path="?a/&lt;http://purl.org/business-register#officialNumber&gt;"></Input>
      <Input path="?b/&lt;http://purl.org/business-register#officialNumber&gt;"></Input>
      </Compare>
      </LinkageRule>
  • go to the rule detail
  • add database output to the rule
    • Min confidence: 0.95
    • Max confidence:
  • add a new rule
    • Label: contact-by-mail-name
    • Link type: owl:sameAs
    • Source restriction: ?x pc:contact ?a
    • Target restriction: ?y pc:contact ?b
    • Linkage rule:

      <LinkageRule>
      <Aggregate type="min">
      <Compare weight="1" threshold="1.0" required="true" metric="equality">
      <Input path="?a/vcard2006:email"></Input>
      <Input path="?b/vcard2006:email"></Input>
      </Compare>
      <Compare weight="1" threshold="1.0" required="true" metric="equality">
      <Input path="?a/vcard2006:name"></Input>
      <Input path="?b/vcard2006:name"></Input>
      </Compare>
      </Aggregate>
      </LinkageRule>
  • go to the rule detail
  • add database output to the rule
    • Min confidence: 1
    • Max conficence:

Pipelines

  • assign "oi-test" group to the Linker transformer instance in "test" pipeline
  • assign Linker to pipeline "test"
    • WD: transformers-working-dir/oi
    • Allow on clean DB: yes
    • priority: 3

Send data

  • start Engine
  • send data from example-data-isvzus.ttl to pipeline "test" (option pipelineName in example-metadata.properties)

Verify results

  • check that the pipeline run successfully with all the assigned transformers in the log [pipeline works]
  • copy UUID of the inserted graph to clipboard
  • make URI query for http://ld.opendata.cz/resource/isvzus.cz/public-contract/216050 [URI query works]
    • verify that all the data about the public contract from example-data-isvzus.ttl are returned
    • check that the newly inserted graph is listed as the source [data correctly stored]
    • check that the newly inserted graph has correct metadata [metadata correctly stored]
    • (verify that diacritics is displayed properly)
    • check that the result contains gr:hasCurrencyValue properties of price blank nodes; check that values of gr:hasCurrencyValue are a typed literal (should be xsd:float) [label properties work, DN rule works]
  • make metadata query for the newly inserted graph
    • check that the graph has correct metadata
    • check that the score is 0.45 and "Publication date after tender deadline" and "Invalid gr:hasCurrencyValue price" rules matched [QA rules work]
    • check that the provenance metadata are listed [provenance metadata correctly stored]
  • make URI query for http://ld.opendata.cz/resource/business-entity/6f7f8340-7364-4e5e-a2d3-bd4fc26eb724