Menu

Relational database documentation

Mifeet Tomas Soukup Jakub Daniel Petr Jerman

Relational database documentation

The backend components and Engine are configured through relational database tables which are stored in the Virtuoso instance for the clean database. If Virtuoso is set up using the default settings, the database can be manipulated using the isql utility:

isql -S 1111

Note that some configuration options may also be contained in the global configuration file.

Conflict Resolution

The conflict resolution is configured in the following tables:

CREATE TABLE DB.FRONTEND.CR_SETTINGS (
    name NVARCHAR(255) NOT NULL PRIMARY KEY,
    value NVARCHAR(255) NOT NULL,
    description LONG NVARCHAR
);
CREATE TABLE DB.FRONTEND.CR_AGGREGATION_TYPES (
    id INTEGER NOT NULL IDENTITY PRIMARY KEY,
    label NVARCHAR(255) UNIQUE NOT NULL,
    description LONG NVARCHAR   
);
CREATE TABLE DB.FRONTEND.CR_PROPERTIES (
    id INTEGER NOT NULL IDENTITY PRIMARY KEY,
    property NVARCHAR(1024) UNIQUE NOT NULL,
    multivalue SMALLINT NULL DEFAULT 1,
    aggregationTypeId INTEGER NULL,

    FOREIGN KEY (aggregationTypeId) REFERENCES DB.FRONTEND.CR_AGGREGATION_TYPES(id)
);
  • table CR_AGGREGATION_TYPES contains the list of available aggregation methods and is for reading only
  • table CR_SETTINGS contains global default Conflict Resolution settings it can be updated using the following queries:


    /* Sets the default aggregation to ALL; possible values are listed in CR_AGGREGATION_TYPES */
    UPDATE DB.FRONTEND.CR_SETTINGS SET value='ALL' WHERE name='DEFAULT_AGGREGATION';
    /* Sets the default multivalue setting to false */
    UPDATE DB.FRONTEND.CR_SETTINGS SET value='0' WHERE name='DEFAULT_MULTIVALUE';
    /* Sets the default aggregation to RETURN_ALL; possible values are RETURN_ALL and IGNORE */
    UPDATE DB.FRONTEND.CR_SETTINGS SET value='RETURN_ALL' WHERE name='ERROR_STRATEGY';

  • table CR_PROPERTIES contains default Conflict Resolution settings for individual properties. An example of how to update the settings:


    /* Sets the aggregation to ALL and multivalue to true for property rdf:type */
    INSERT INTO DB.FRONTEND.CR_PROPERTIES (property, multivalue, aggregationTypeId)
    VALUES ('http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 1, (
    SELECT id FROM DB.FRONTEND.CR_AGGREGATION_TYPES WHERE label='ALL'));

Quality Assesment

The Quality Assessment gathers rules from tables in relational database. Their scheme follows:

CREATE TABLE DB.FRONTEND.QA_RULES
(
    id INTEGER NOT NULL IDENTITY PRIMARY KEY,
    filter LONG NVARCHAR NOT NULL,
    coefficient REAL NOT NULL,
    description LONG NVARCHAR
);
CREATE TABLE DB.FRONTEND.DATA_DOMAINS
(
    id INTEGER NOT NULL IDENTITY PRIMARY KEY,
    uri NVARCHAR(1024) NOT NULL
);

CREATE TABLE DB.FRONTEND.QA_RULES_TO_DOMAINS_RESTRICTIONS
(
    ruleId INTEGER NOT NULL,
    domainId INTEGER NOT NULL,

    PRIMARY KEY (ruleId, domainId),
    FOREIGN KEY (ruleId) REFERENCES DB.FRONTEND.QA_RULES(id),
    FOREIGN KEY (domainId) REFERENCES DB.FRONTEND.DATA_DOMAINS(id)
);
  • the table QA_RULES contains all the rules
  • the table DATA_DOMAINS contains domains (publishers) where the named graphs that we evaluate might come from
  • the table QA_RULES_TO_DOMAINS_RESTRICTIONS allows us to restrict individual rules to individual domains (publishers)

To demonstrate how to set up Quality Assessment we recommand issuing these SQL inserts:

insert
  into DB.FRONTEND.QA_RULES (filter, coeficient, description)
  values
    ('{{?s <http://purl.org/procurement#contactPerson> ?c}}
      GROUP BY ?g HAVING count(?c) < 1',
     0.8,
     'PROCUREMENT CONTACT PERSON MISSING');
insert
  into DB.FRONTEND.QA_RULES (filter, coeficient, description)
  values
    ('{{?s <http://purl.org/procurement#tenderDeadline> ?d;
      http://purl.org/procurement#endDate> ?e} FILTER (?e > ?d)}',
     0.9,
     'TENDER COMPLETION DATE EXCEEDED ITS DEADLINE');

then add a known domain, e.g.:

insert into DB.FRONTEND.DATA_DOMAINS (uri) VALUES ('http://opendata.cz')

and restrict one of the rules to that domain (use the correct ID's):

insert
  into DB.FRONTEND.QA_RULES_TO_DOMAINS_RESTRICTIONS (ruleId, domainId)
  values (0, 0)

Object Identification

The rules for Object Identification are stored in following tables:

CREATE TABLE DB.FRONTEND.OI_RULES
(
    id INTEGER NOT NULL IDENTITY PRIMARY KEY,
    groupId INTEGER NOT NULL,
    definition LONG NVARCHAR NOT NULL,

    FOREIGN KEY (groupId) REFERENCES DB.FRONTEND.OI_RULES_GROUPS(id)
);

CREATE TABLE DB.FRONTEND.OI_RULES_GROUPS
(
    id INTEGER NOT NULL IDENTITY PRIMARY KEY,
    label NVARCHAR(255) NOT NULL,
    description LONG NVARCHAR
);
  • OI_RULES table contains rules in Silk-LS - definition column contains Interlink XML element and its contents.
  • Rules can by written manualy or exported from the Silk Workbench - Export as Silk-LS in linkage rule editor.
  • Rules with same groupId will be used together, every linker gets a list of those IDs in its configuration
  • OI_RULES_GROUP table groups rules so they can be easily managed from the frontend. You can ignore it for now.

A new rule can be simpy inserted:

INSERT INTO DB.FRONTEND.OI_RULES VALUES (x,y,'<Interlink>...</Interlink>')

where x is rule ID and y is group ID.

You can also update any rule:

UPDATE DB.FRONTEND.OI_RULES SET definition='<Interlink>...</Interlink>' where id=x

Engine

The engine pipeline is configured in the following table:

CREATE TABLE DB.FRONTEND.REGISTERED_TRANSFORMERS
(
    id INTEGER NOT NULL IDENTITY PRIMARY KEY,
    label NVARCHAR(255) NOT NULL,
    description LONG NVARCHAR,
    jarPath NVARCHAR(255) NOT NULL,
    fullClassName NVARCHAR(255) NOT NULL,
    workDirPath NVARCHAR(255) NOT NULL,
    configuration LONG NVARCHAR NOT NULL,
    active SMALLINT NOT NULL DEFAULT 0,
    priority INTEGER NOT NULL DEFAULT 0
);

A new tranformer execution can be simply inserted into the pipeline:

INSERT INTO DB.FRONTEND.REGISTERED_TRANSFORMERS
(label,jarPath,fullClassName,workDirPath,configuration,active,priority)     
VALUES(
    'QA',
    '.',
    'cz.cuni.mff.odcleanstore.qualityassessment.
        impl.QualityAssessorImpl',
    'transformers-working-dir/qa',
    '',
    1,
    1);

INSERT INTO DB.FRONTEND.REGISTERED_TRANSFORMERS
(label,jarPath,fullClassName,workDirPath,configuration,active,priority)     
VALUES(
    'Linker',
    '.',
    'cz.cuni.mff.odcleanstore.linker.impl.LinkerImpl',
    'transformers-working-dir/linker',
     '1',
     1,
     2);

Label is a human readable label for the frontend.
JarPath is a full file system path to a jar file with a transformer implementation.
FullClassName is a java full class name path of a transformer class implementation.
WorkDirPath is a file system working directory path for the transformer.
Configuration is a string for a free use of the transformer.
Active value 1 starts the transformer execution in the pipeline.
Priority defines the execution order of the transformer in the pipeline.

Note: the prototype release supports only the above mentioned two transformers.