Testing scenario for release 0.3.4
Testováno nad částí dat z ISVZUSu, připraveno na prezentaci.
Data sets needed
ISVZUS (sample 2x)
Scenario
Zakladni kroky
- ukazat vstupni data A (nejake business entities, nejake contracts), co tam je za data, co jim chybi, mely by tam byt chyby (aspon jedna, dve) pro DN, 1-2 QA pravidla by mela selhat.
- ukazat jak vypadaji pravidla pro DN, vysvetlit templates, OI, QA (a jak to adresuje ty nedostatky - jen nektere)
- create pipeline, create normalizer, QA, linker (to je bude zajimat asi nejvic, trebas linkovani sameAs mezi be), vysvetlit, ze nepotrebuju blank node removal
- assignovat predpripravenou skupinu DN pravidel, OI pravidlo a QA pravidel
- posunout linker pred QA
- vysvetlit nastaveni u pipeline (run on clean db, default, ...)
- ukazat pres simple webovou stranku pro agregace, ze tam zadna data o business entite X ze vstupniho souboru nejsou (napriklad)
- pustit pipeline, rychle rict co se stalo, ukazat, ze tam data jsou, ukazat jak se hodnoty upravili (napr. diky cleaneru) napr. zas pres tu webovou stranku pro agregace.
- ukazat vstupni data B, ukazat syntaktickou chybu v dokumentu
- zadne QA rule neselze (tedy kvalitni data az na syntaktickou chybu)
- data B by mela obsahovat nektere sameAs business entitites jako A (at vzniknou linky) a mame pak conflicts a ruznou quality pro data o business entities z A a B
- pustit pipeline a predhodit ji data B, ukazat, ze je neco v error graph
- opravit syntaktickou chybu, pustit znovu, ukazat vysledek pres agregacni service
- pohrat si s ruznymi agregacemi
- managing accounts, assigning roles (jen prulet)
- probehnout v rychlosti zbylou funkcionalitu (jen ukazat a slovne popsat), pripadne zkusit predvest, viz dale:
Dale (co funguje/stihnes, kdyztak trebas jen zmin na slidech)
- zmenit pravidlo (asi linkovaci), pipeline sebehne znova, ukazat zmenu pres data aggregation browser nebo sparql
- listing ontologies, creating mappings, jak se mapping projevi pri data aggregation?
Set up pipeline
URL Prefixes
- Add new URI prefix
- Add new URI prefix
- be careful of vcard prefix
Label properties
- Add new label properties (Querying)
Pipelines
- create pipeline labeled "test"
- assign DN transformer to pipeline "test"
- WD: transformers-working-dir/dn
- Allow on clean DB: ?
- priority: 1
DN rules
- create a DN rule group "dn-test"
- add a new rule
- Description: Convert gr:hasCurrencyValue to xsd:float
- add rule component
- Type: INSERT
- Description: insert typed literal
- Modification:
{ ?s ?p xsd:float(bif:replace(bif:replace(?o, ' ', ''), ',', '.'))
}
WHERE {
?s ?p ?o.
FILTER (?p = <http://purl.org/goodrelations/v1#hasCurrencyValue>)
FILTER REGEX(?o, '^[0-9][0-9 ][.,]?[0-9]$')
}
-
add rule component
-
add a new rule
- Description: Convert br:officialNumber to xsd:integer
- add rule component
- add rule component
Pipelines
- assign "dn-test" group to the DN transformer instance in "test" pipeline
- assign QA transformer to pipeline "test"
- WD: transformers-working-dir/qa
- Allow on clean DB: yes
- priority: 2
QA rules
- create a QA rule group "qa-test"
- add a new rule
- Coefficient: 0.9
- Description: Invalid email address.
- Filter:
{ ?s vcard2006:email ?mail. FILTER(!regex(?mail, "^[A-Z0-9._%-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}$", "i")) }
- add a new rule
- Coefficient: 0.9
- Description: Publication date after tender deadline.
- Filter:
{?s <http://purl.org/procurement/public-contracts#publicationDate> ?p.
?s <http://purl.org/procurement/public-contracts#tenderDeadline> ?d.
FILTER (bif:datediff('day', xsd:date(?p), xsd:date(?d)) < 1)}
- add a new rule
Pipelines
- assign "qa-test" group to the QA transformer instance in "test" pipeline
- assign Linker to pipeline "test"
- WD: transformers-working-dir/oi
- Allow on clean DB: yes
- priority: 3
OI rules
- create an OI rule group "oi-test"
- add a new rule
- go to the rule detail
- add database output to the rule
- Min confidence: 0.95
- Max confidence:
- add a new rule
- go to the rule detail
- add database output to the rule
- Min confidence: 1
- Max conficence:
Pipelines
- assign "oi-test" group to the Linker transformer instance in "test" pipeline
- assign Linker to pipeline "test"
- WD: transformers-working-dir/oi
- Allow on clean DB: yes
- priority: 3
Send data
- start Engine
- send data from example-data-isvzus.ttl to pipeline "test" (option pipelineName in example-metadata.properties)
Verify results
- check that the pipeline run successfully with all the assigned transformers in the log [pipeline works]
- copy UUID of the inserted graph to clipboard
- make URI query for http://ld.opendata.cz/resource/isvzus.cz/public-contract/216050 [URI query works]
- verify that all the data about the public contract from example-data-isvzus.ttl are returned
- check that the newly inserted graph is listed as the source [data correctly stored]
- check that the newly inserted graph has correct metadata [metadata correctly stored]
- (verify that diacritics is displayed properly)
- check that the result contains gr:hasCurrencyValue properties of price blank nodes; check that values of gr:hasCurrencyValue are a typed literal (should be xsd:float) [label properties work, DN rule works]
- make metadata query for the newly inserted graph
- check that the graph has correct metadata
- check that the score is 0.45 and "Publication date after tender deadline" and "Invalid gr:hasCurrencyValue price" rules matched [QA rules work]
- check that the provenance metadata are listed [provenance metadata correctly stored]
- make URI query for http://ld.opendata.cz/resource/business-entity/6f7f8340-7364-4e5e-a2d3-bd4fc26eb724