The backend components and Engine are configured through relational database tables which are stored in the Virtuoso instance for the clean database. If Virtuoso is set up using the default settings, the database can be manipulated using the isql utility:
isql -S 1111
Note that some configuration options may also be contained in the global configuration file.
The conflict resolution is configured in the following tables:
CREATE TABLE DB.FRONTEND.CR_SETTINGS ( name NVARCHAR(255) NOT NULL PRIMARY KEY, value NVARCHAR(255) NOT NULL, description LONG NVARCHAR ); CREATE TABLE DB.FRONTEND.CR_AGGREGATION_TYPES ( id INTEGER NOT NULL IDENTITY PRIMARY KEY, label NVARCHAR(255) UNIQUE NOT NULL, description LONG NVARCHAR ); CREATE TABLE DB.FRONTEND.CR_PROPERTIES ( id INTEGER NOT NULL IDENTITY PRIMARY KEY, property NVARCHAR(1024) UNIQUE NOT NULL, multivalue SMALLINT NULL DEFAULT 1, aggregationTypeId INTEGER NULL, FOREIGN KEY (aggregationTypeId) REFERENCES DB.FRONTEND.CR_AGGREGATION_TYPES(id) );
table CR_SETTINGS contains global default Conflict Resolution settings it can be updated using the following queries:
/* Sets the default aggregation to ALL; possible values are listed in CR_AGGREGATION_TYPES */
UPDATE DB.FRONTEND.CR_SETTINGS SET value='ALL' WHERE name='DEFAULT_AGGREGATION';
/* Sets the default multivalue setting to false */
UPDATE DB.FRONTEND.CR_SETTINGS SET value='0' WHERE name='DEFAULT_MULTIVALUE';
/* Sets the default aggregation to RETURN_ALL; possible values are RETURN_ALL and IGNORE */
UPDATE DB.FRONTEND.CR_SETTINGS SET value='RETURN_ALL' WHERE name='ERROR_STRATEGY';
table CR_PROPERTIES contains default Conflict Resolution settings for individual properties. An example of how to update the settings:
/* Sets the aggregation to ALL and multivalue to true for property rdf:type */
INSERT INTO DB.FRONTEND.CR_PROPERTIES (property, multivalue, aggregationTypeId)
VALUES ('http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 1, (
SELECT id FROM DB.FRONTEND.CR_AGGREGATION_TYPES WHERE label='ALL'));
The Quality Assessment gathers rules from tables in relational database. Their scheme follows:
CREATE TABLE DB.FRONTEND.QA_RULES ( id INTEGER NOT NULL IDENTITY PRIMARY KEY, filter LONG NVARCHAR NOT NULL, coefficient REAL NOT NULL, description LONG NVARCHAR ); CREATE TABLE DB.FRONTEND.DATA_DOMAINS ( id INTEGER NOT NULL IDENTITY PRIMARY KEY, uri NVARCHAR(1024) NOT NULL ); CREATE TABLE DB.FRONTEND.QA_RULES_TO_DOMAINS_RESTRICTIONS ( ruleId INTEGER NOT NULL, domainId INTEGER NOT NULL, PRIMARY KEY (ruleId, domainId), FOREIGN KEY (ruleId) REFERENCES DB.FRONTEND.QA_RULES(id), FOREIGN KEY (domainId) REFERENCES DB.FRONTEND.DATA_DOMAINS(id) );
To demonstrate how to set up Quality Assessment we recommand issuing these SQL inserts:
insert into DB.FRONTEND.QA_RULES (filter, coeficient, description) values ('{{?s <http://purl.org/procurement#contactPerson> ?c}} GROUP BY ?g HAVING count(?c) < 1', 0.8, 'PROCUREMENT CONTACT PERSON MISSING'); insert into DB.FRONTEND.QA_RULES (filter, coeficient, description) values ('{{?s <http://purl.org/procurement#tenderDeadline> ?d; http://purl.org/procurement#endDate> ?e} FILTER (?e > ?d)}', 0.9, 'TENDER COMPLETION DATE EXCEEDED ITS DEADLINE');
then add a known domain, e.g.:
insert into DB.FRONTEND.DATA_DOMAINS (uri) VALUES ('http://opendata.cz')
and restrict one of the rules to that domain (use the correct ID's):
insert into DB.FRONTEND.QA_RULES_TO_DOMAINS_RESTRICTIONS (ruleId, domainId) values (0, 0)
The rules for Object Identification are stored in following tables:
CREATE TABLE DB.FRONTEND.OI_RULES ( id INTEGER NOT NULL IDENTITY PRIMARY KEY, groupId INTEGER NOT NULL, definition LONG NVARCHAR NOT NULL, FOREIGN KEY (groupId) REFERENCES DB.FRONTEND.OI_RULES_GROUPS(id) ); CREATE TABLE DB.FRONTEND.OI_RULES_GROUPS ( id INTEGER NOT NULL IDENTITY PRIMARY KEY, label NVARCHAR(255) NOT NULL, description LONG NVARCHAR );
A new rule can be simpy inserted:
INSERT INTO DB.FRONTEND.OI_RULES VALUES (x,y,'<Interlink>...</Interlink>')
where x is rule ID and y is group ID.
You can also update any rule:
UPDATE DB.FRONTEND.OI_RULES SET definition='<Interlink>...</Interlink>' where id=x
The engine pipeline is configured in the following table:
CREATE TABLE DB.FRONTEND.REGISTERED_TRANSFORMERS ( id INTEGER NOT NULL IDENTITY PRIMARY KEY, label NVARCHAR(255) NOT NULL, description LONG NVARCHAR, jarPath NVARCHAR(255) NOT NULL, fullClassName NVARCHAR(255) NOT NULL, workDirPath NVARCHAR(255) NOT NULL, configuration LONG NVARCHAR NOT NULL, active SMALLINT NOT NULL DEFAULT 0, priority INTEGER NOT NULL DEFAULT 0 );
A new tranformer execution can be simply inserted into the pipeline:
INSERT INTO DB.FRONTEND.REGISTERED_TRANSFORMERS (label,jarPath,fullClassName,workDirPath,configuration,active,priority) VALUES( 'QA', '.', 'cz.cuni.mff.odcleanstore.qualityassessment. impl.QualityAssessorImpl', 'transformers-working-dir/qa', '', 1, 1); INSERT INTO DB.FRONTEND.REGISTERED_TRANSFORMERS (label,jarPath,fullClassName,workDirPath,configuration,active,priority) VALUES( 'Linker', '.', 'cz.cuni.mff.odcleanstore.linker.impl.LinkerImpl', 'transformers-working-dir/linker', '1', 1, 2);
Label is a human readable label for the frontend.
JarPath is a full file system path to a jar file with a transformer implementation.
FullClassName is a java full class name path of a transformer class implementation.
WorkDirPath is a file system working directory path for the transformer.
Configuration is a string for a free use of the transformer.
Active value 1 starts the transformer execution in the pipeline.
Priority defines the execution order of the transformer in the pipeline.
Note: the prototype release supports only the above mentioned two transformers.