ODClean Store Wiki

Linked Data management tool

Brought to you by: dusanr, jermanp, mifeet, toknap, tosoukup

Testing scenario 0.3.4

Authors:

Testing scenario for release 0.3.4

Testováno nad částí dat z ISVZUSu, připraveno na prezentaci.

Data sets needed

ISVZUS (sample 2x)

Scenario

Zakladni kroky

ukazat vstupni data A (nejake business entities, nejake contracts), co tam je za data, co jim chybi, mely by tam byt chyby (aspon jedna, dve) pro DN, 1-2 QA pravidla by mela selhat.
ukazat jak vypadaji pravidla pro DN, vysvetlit templates, OI, QA (a jak to adresuje ty nedostatky - jen nektere)
create pipeline, create normalizer, QA, linker (to je bude zajimat asi nejvic, trebas linkovani sameAs mezi be), vysvetlit, ze nepotrebuju blank node removal
- assignovat predpripravenou skupinu DN pravidel, OI pravidlo a QA pravidel
- posunout linker pred QA
- vysvetlit nastaveni u pipeline (run on clean db, default, ...)
ukazat pres simple webovou stranku pro agregace, ze tam zadna data o business entite X ze vstupniho souboru nejsou (napriklad)
pustit pipeline, rychle rict co se stalo, ukazat, ze tam data jsou, ukazat jak se hodnoty upravili (napr. diky cleaneru) napr. zas pres tu webovou stranku pro agregace.

ukazat vstupni data B, ukazat syntaktickou chybu v dokumentu
- zadne QA rule neselze (tedy kvalitni data az na syntaktickou chybu)
- data B by mela obsahovat nektere sameAs business entitites jako A (at vzniknou linky) a mame pak conflicts a ruznou quality pro data o business entities z A a B
pustit pipeline a predhodit ji data B, ukazat, ze je neco v error graph
opravit syntaktickou chybu, pustit znovu, ukazat vysledek pres agregacni service
pohrat si s ruznymi agregacemi

managing accounts, assigning roles (jen prulet)
probehnout v rychlosti zbylou funkcionalitu (jen ukazat a slovne popsat), pripadne zkusit predvest, viz dale:

Dale (co funguje/stihnes, kdyztak trebas jen zmin na slidech)

zmenit pravidlo (asi linkovaci), pipeline sebehne znova, ukazat zmenu pres data aggregation browser nebo sparql
listing ontologies, creating mappings, jak se mapping projevi pri data aggregation?

Set up pipeline

URL Prefixes

Add new URI prefix
- Prefix: gr
- URI: http://purl.org/goodrelations/v1#
Add new URI prefix
- Prefix: pc
- URI: http://purl.org/procurement/public-contracts#
be careful of vcard prefix

Label properties

Add new label properties (Querying)

Pipelines

create pipeline labeled "test"
assign DN transformer to pipeline "test"
- WD: transformers-working-dir/dn
- Allow on clean DB: ?
- priority: 1

DN rules

create a DN rule group "dn-test"
add a new rule
- Description: Convert gr:hasCurrencyValue to xsd:float

add rule component

Type: INSERT
Description: insert typed literal

Modification:



  { ?s ?p xsd:float(bif:replace(bif:replace(?o, ' ', ''), ',', '.')) }

  WHERE  {

      ?s ?p ?o.

      FILTER (?p = <http://purl.org/goodrelations/v1#hasCurrencyValue>)

      FILTER REGEX(?o, '^[0-9][0-9 ][.,]?[0-9]$')

  }

add rule component

Type: DELETE
Description: remove old converted literals

Modification:



  { ?s ?p ?o }

  WHERE  {

      ?s ?p ?o.

      FILTER (?p = <http://purl.org/goodrelations/v1#hasCurrencyValue>)

      FILTER REGEX(?o, '^[0-9][0-9 ][.,]?[0-9]$')

      FILTER (datatype(?o) != xsd:float)

  }

add a new rule
- Description: Convert br:officialNumber to xsd:integer

add rule component

Type: INSERT
Description: insert typed literal

Modification:



  { ?s ?p xsd:integer(?o) }

  WHERE  {

    ?s ?p ?o.

    FILTER(?p = <http://purl.org/business-register#officialNumber>)

    FILTER (!bif:isnull(xsd:integer(?o)))

  }

add rule component

Type: DELETE
Description: remove old converted literals

Modification:



 { ?s ?p ?o }

 WHERE  {

    ?s ?p ?o.

    FILTER(?p = <http://purl.org/business-register#officialNumber>)

    FILTER (!bif:isnull(xsd:integer(?o)) && datatype(?o) != xsd:integer)

 }

Pipelines

assign "dn-test" group to the DN transformer instance in "test" pipeline
assign QA transformer to pipeline "test"
- WD: transformers-working-dir/qa
- Allow on clean DB: yes
- priority: 2

QA rules

create a QA rule group "qa-test"

add a new rule

Coefficient: 0.9
Description: Invalid email address.

Filter:

{ ?s vcard2006:email ?mail. FILTER(!regex(?mail, "^[A-Z0-9._%-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}$", "i")) }

add a new rule

Coefficient: 0.9
Description: Publication date after tender deadline.

Filter:



 {?s <http://purl.org/procurement/public-contracts#publicationDate> ?p. 

  ?s <http://purl.org/procurement/public-contracts#tenderDeadline> ?d.

  FILTER (bif:datediff('day', xsd:date(?p), xsd:date(?d)) < 1)}

add a new rule

Coefficient: 0.5
Description: Invalid gr:hasCurrencyValue price.

Filter:

{ ?s gr:hasCurrencyValue ?price. FILTER(!regex(?price, "^[0-9 ]*([.,][0-9]+)?$", "i")) }

Pipelines

assign "qa-test" group to the QA transformer instance in "test" pipeline
assign Linker to pipeline "test"
- WD: transformers-working-dir/oi
- Allow on clean DB: yes
- priority: 3

OI rules

create an OI rule group "oi-test"

add a new rule

Label: BE-by-number
Link type: owl:sameAs
Source restriction: ?a rdf:type gr:BusinessEntity
Target restriction: ?b rdf:type gr:BusinessEntity

Linkage rule:



 <LinkageRule>

   <Compare weight="1" threshold="1.0" required="true" metric="equality">

     <Input path="?a/&lt;http://purl.org/business-register#officialNumber&gt;"></Input>

     <Input path="?b/&lt;http://purl.org/business-register#officialNumber&gt;"></Input>

   </Compare>

 </LinkageRule>

go to the rule detail
add database output to the rule
- Min confidence: 0.95
- Max confidence:

add a new rule

Label: contact-by-mail-name
Link type: owl:sameAs
Source restriction: ?x pc:contact ?a
Target restriction: ?y pc:contact ?b

Linkage rule:



 <LinkageRule>

   <Aggregate type="min">

     <Compare weight="1" threshold="1.0" required="true" metric="equality">

       <Input path="?a/vcard2006:email"></Input>

       <Input path="?b/vcard2006:email"></Input>

     </Compare>

     <Compare weight="1" threshold="1.0" required="true" metric="equality">

       <Input path="?a/vcard2006:name"></Input>

       <Input path="?b/vcard2006:name"></Input>

     </Compare>

   </Aggregate>

 </LinkageRule>

go to the rule detail
add database output to the rule
- Min confidence: 1
- Max conficence:

Pipelines

assign "oi-test" group to the Linker transformer instance in "test" pipeline
assign Linker to pipeline "test"
- WD: transformers-working-dir/oi
- Allow on clean DB: yes
- priority: 3

Send data

start Engine
send data from example-data-isvzus.ttl to pipeline "test" (option pipelineName in example-metadata.properties)

Verify results

check that the pipeline run successfully with all the assigned transformers in the log [pipeline works]
copy UUID of the inserted graph to clipboard
make URI query for http://ld.opendata.cz/resource/isvzus.cz/public-contract/216050 [URI query works]
- verify that all the data about the public contract from example-data-isvzus.ttl are returned
- check that the newly inserted graph is listed as the source [data correctly stored]
- check that the newly inserted graph has correct metadata [metadata correctly stored]
- (verify that diacritics is displayed properly)
- check that the result contains gr:hasCurrencyValue properties of price blank nodes; check that values of gr:hasCurrencyValue are a typed literal (should be xsd:float) [label properties work, DN rule works]
make metadata query for the newly inserted graph
- check that the graph has correct metadata
- check that the score is 0.45 and "Publication date after tender deadline" and "Invalid gr:hasCurrencyValue price" rules matched [QA rules work]
- check that the provenance metadata are listed [provenance metadata correctly stored]
make URI query for http://ld.opendata.cz/resource/business-entity/6f7f8340-7364-4e5e-a2d3-bd4fc26eb724
- check that it has multiple values of gr:legalName (result of linking) [Linker works]
- check that the value of http://purl.org/business-register#officialNumber is a typed literal [DN rule works]