<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to Project description</title><link>https://sourceforge.net/p/odcleanstore/wiki/Project%2520description/</link><description>Recent changes to Project description</description><atom:link href="https://sourceforge.net/p/odcleanstore/wiki/Project%20description/feed" rel="self"/><language>en</language><lastBuildDate>Mon, 23 Apr 2012 21:24:57 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/odcleanstore/wiki/Project%20description/feed" rel="self" type="application/rss+xml"/><item><title>WikiPage Project description modified by Tomas Soukup</title><link>https://sourceforge.net/p/odcleanstore/wiki/Project%2520description/</link><description>&lt;pre&gt;--- v11 
+++ v12 
@@ -56,3 +56,6 @@
 Object identification
 =====================
 
+Object identification (or linking) is also a special implementation of a transformer.
+
+The main purpose of this process is to interlink URIs which represent the same real-world entity by generating owl:sameAs links. It can be also used for creating other types of links between differently related URIs. [Silk framework](http://www4.wiwiss.fu-berlin.de/bizer/silk/) is used as the linking engine. Sets of linkage rules for the engine are written in [Silk-LSL](http://www.assembla.com/wiki/show/silk/Link_Specification_Language), stored in database and can be managed through our web frontend.
&lt;/pre&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Tomas Soukup</dc:creator><pubDate>Mon, 23 Apr 2012 21:24:57 -0000</pubDate><guid>https://sourceforge.nete206c67cb86e603864fea6335dfddbefef5c7b3d</guid></item><item><title>WikiPage Project description modified by Petr Jerman</title><link>https://sourceforge.net/p/odcleanstore/wiki/Project%2520description/</link><description>&lt;pre&gt;--- v10 
+++ v11 
@@ -38,15 +38,6 @@
 
 In order to supply a smooth user experience, some methods of sharing user accounts across all related projects (e.g. especially the storage and the scraper) are being considered. Target users could then administer all projects using individual websites without re-authorizations.
 
-Input and output Web services
-=============================
-
-Engine
-======
-
-Transformers
-============
-
 Data normalization and Quality Assessment
 =========================================
 
&lt;/pre&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Petr Jerman</dc:creator><pubDate>Mon, 16 Apr 2012 03:59:05 -0000</pubDate><guid>https://sourceforge.netc901b2caff3c67bec2bbb87504a34fab84567c31</guid></item><item><title>WikiPage Project description modified by Jan Michelfeit</title><link>https://sourceforge.net/p/odcleanstore/wiki/Project%2520description/</link><description>&lt;pre&gt;&lt;/pre&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jan Michelfeit</dc:creator><pubDate>Mon, 16 Apr 2012 01:40:05 -0000</pubDate><guid>https://sourceforge.net059400b35c3e284f401542a1b5db9f131261c8e6</guid></item><item><title>WikiPage Project description modified by Jan Michelfeit</title><link>https://sourceforge.net/p/odcleanstore/wiki/Project%2520description/</link><description>&lt;pre&gt;--- v8 
+++ v9 
@@ -1,40 +1,67 @@
 Introduction
 ============
 
+The goal of ODCleanStore is to build a server which will store, clean, link, and score incoming RDF data and provide aggregated and integrated views on the data to Linked Data consumers. Motivation behind the project is described in the [specification](https://sourceforge.net/p/odcleanstore/wiki/Vize/#motivation).
+
+Interfaces and related projects
+----------------------
+ODCleanStore accepts arbitrary RDF data through the *webservice for publishers*, together with provenance metadata. The data are stored to the *dirty database* where they are  cleaned, scored, linked to other data etc. Subsequently, the data are moved to the *clean database* where it can be queried through the *webservice for consumers*. The response to a query consists of relevant RDF triples together with their provenance information and quality estimate. [Openlink Virtuoso](http://virtuoso.openlinksw.com/) is used for storing the data.
+
+The webservices will communicate in standard formats ([RDF/XML](http://www.w3.org/TR/REC-rdf-syntax/), [TriG](http://www4.wiwiss.fu-berlin.de/bizer/TriG/)) in order to integrate with an arbitrary producer or consumer of data.
+A data acquisition module [Strigil](https://sourceforge.net/p/strigil/home/Home/), that would obtain information from (X)HTML pages or Excel spreadsheets, convert it to RDF and feed it to ODCleanStore, is currently under development. In the future, a data visualization and analysis module built on the webservice for consumers will be developed.
+
+Data processing
+---------------
+Data accepted through the webservice for publishers is stored as a [named graph](http://www.w3.org/2004/03/trix/) to the dirty database. The ODCleanStore Engine takes these named graphs and runs them through a pipeline of *transformers*. A transformer is a Java class implementing a [defined interface](http://sourceforge.net/p/odcleanstore/code/ci/0f877e663e68e60c187d77cb91584964fa23ae2f/tree/odcleanstore/backend/src/main/java/cz/cuni/mff/odcleanstore/transformer/Transformer.java). Each transformer may modify the processed named graph (e.g. normalize values, deal with blank nodes) or attach a new named graph (e.g. quality assessment results,  links to the data in the clean database or links to other datasets). Custom transformers can be easily plugged in to an arbitrary place in the processing pipeline.
+
+Several transformers have special importance with regard to integration of data from various sources and the quality assessment, and are integrated to the web user interface: Data Normalization, Quality Assessment and Object Identification. These are described in more detail below.
+
+When the named graph passes through all the transformers in the pipeline, it is moved to the clean database and made available for queries.
+
+Queries
+-------
+Stored data can be accessed through a RESTful webservice. Two types of queries are supported: URI query and keyword query. Relevant triples from the clean database are returned for each query. Because the triples may originate in various sources with different ontologies used to model the data and with various quality, the data are aggregated according to aggregation settings which may be supplied with the query.
+
+The returned RDF triples are accompanied with the sources they come from and with a quality estimate based on the quality of the sources and conflicts during the aggregation phase. More information about the provenance and quality score of each source named graph may be requested.
+
+In addition to URI and keyword queries, a limited access to the clean database through a SPARQL endpoint will be provided.
+
+
 Web frontend
 ============
 
 Basic configuration of the whole application will be done through a simple website, based on the Apache Wicket framework.
 
 The website will allow managing user accounts (restricting permissions to use various parts of the website and to insert data through input services), ontologies, configuring (custom) transformers and the engine.
 
 The configurations are expected to be done through simple HTML forms. A more convenient (AJAX based) user interface might be implemented in future.
 
 In order to supply a smooth user experience, some methods of sharing user accounts across all related projects (e.g. especially the storage and the scraper) are being considered. Target users could then administer all projects using individual websites without re-authorizations.
 
 Input and output Web services
 =============================
 
 Engine
 ======
 
 Transformers
 ============
 
 Data normalization and Quality Assessment
 =========================================
 
 Data normalization and Quality Assessment are special implementation of transformers.
 
-Data normalization is aimed to be applied early in the whole data evaluation process to simplify work of other transformers. Its main goal is to remove inconsistencies in forms the data is provided in. This is achieved by rules that specify patterns (data that comply to certain conditions) that need to be transformed and the way to transform them. The pairs of pattens and transformations are stored in database as rules and the set of all rules can be modified through the frontend.
+Data normalization is aimed to be applied early in the whole data evaluation process to simplify work of other transformers. Its main goal is to remove inconsistencies in forms the data is provided in. This is achieved by rules that specify patterns (data that comply to certain conditions) that need to be transformed and the way to transform them. The pairs of patterns and transformations are stored in database as rules and the set of all rules can be modified through the web frontend.
 
 Quality Assessment assigns a score to each graph based on coefficients of different patterns present in the graph. Each time a score of a graph changes a total score of its publisher (domain of origin) is updated. Again the rules formed of patterns and coefficients are supplied as a special resource to this particular transformer.
 
 The patterns are described in SPARQL conditions.
 
 
 Query execution and Conflict resolution
 =======================================
 
 Object identification
 =====================
+
&lt;/pre&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jan Michelfeit</dc:creator><pubDate>Mon, 16 Apr 2012 01:38:20 -0000</pubDate><guid>https://sourceforge.net896807bc121be7adbe1ead03f4c29c9248ad19ff</guid></item><item><title>WikiPage Project description modified by Jakub Daniel</title><link>https://sourceforge.net/p/odcleanstore/wiki/Project%2520description/</link><description>&lt;pre&gt;--- v7 
+++ v8 
@@ -21,8 +21,17 @@
 Transformers
 ============
 
-Error locatization and Data normalization
+Data normalization and Quality Assessment
 =========================================
+
+Data normalization and Quality Assessment are special implementation of transformers.
+
+Data normalization is aimed to be applied early in the whole data evaluation process to simplify work of other transformers. Its main goal is to remove inconsistencies in forms the data is provided in. This is achieved by rules that specify patterns (data that comply to certain conditions) that need to be transformed and the way to transform them. The pairs of pattens and transformations are stored in database as rules and the set of all rules can be modified through the frontend.
+
+Quality Assessment assigns a score to each graph based on coefficients of different patterns present in the graph. Each time a score of a graph changes a total score of its publisher (domain of origin) is updated. Again the rules formed of patterns and coefficients are supplied as a special resource to this particular transformer.
+
+The patterns are described in SPARQL conditions.
+
 
 Query execution and Conflict resolution
 =======================================
&lt;/pre&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jakub Daniel</dc:creator><pubDate>Sun, 15 Apr 2012 14:40:01 -0000</pubDate><guid>https://sourceforge.net99de73a2838a73a7070a762dc7137fbef420057b</guid></item><item><title>WikiPage Project description modified by Dušan Rychnovský</title><link>https://sourceforge.net/p/odcleanstore/wiki/Project%2520description/</link><description>&lt;pre&gt;--- v6 
+++ v7 
@@ -10,7 +10,7 @@
 
 The configurations are expected to be done through simple HTML forms. A more convenient (AJAX based) user interface might be implemented in future.
 
-In order to allow a smooth user experience, some methods of sharing user accounts across all related projects (e.g. especially the storage and the scraper) are being considered. Target users could then administer all projects using individual websites without re-authorizations.
+In order to supply a smooth user experience, some methods of sharing user accounts across all related projects (e.g. especially the storage and the scraper) are being considered. Target users could then administer all projects using individual websites without re-authorizations.
 
 Input and output Web services
 =============================
&lt;/pre&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Dušan Rychnovský</dc:creator><pubDate>Sat, 14 Apr 2012 13:24:18 -0000</pubDate><guid>https://sourceforge.net75960922bc11c101975030c0790ff344cf310014</guid></item><item><title>WikiPage Project description modified by Dušan Rychnovský</title><link>https://sourceforge.net/p/odcleanstore/wiki/Project%2520description/</link><description>&lt;pre&gt;--- v5 
+++ v6 
@@ -10,6 +10,7 @@
 
 The configurations are expected to be done through simple HTML forms. A more convenient (AJAX based) user interface might be implemented in future.
 
+In order to allow a smooth user experience, some methods of sharing user accounts across all related projects (e.g. especially the storage and the scraper) are being considered. Target users could then administer all projects using individual websites without re-authorizations.
 
 Input and output Web services
 =============================
&lt;/pre&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Dušan Rychnovský</dc:creator><pubDate>Sat, 14 Apr 2012 13:08:30 -0000</pubDate><guid>https://sourceforge.net8256c73910d310723443cb122baec0a7106ab4fd</guid></item><item><title>WikiPage Project description modified by Dušan Rychnovský</title><link>https://sourceforge.net/p/odcleanstore/wiki/Project%2520description/</link><description>&lt;pre&gt;--- v4 
+++ v5 
@@ -8,6 +8,9 @@
 
 The website will allow managing user accounts (restricting permissions to use various parts of the website and to insert data through input services), ontologies, configuring (custom) transformers and the engine.
 
+The configurations are expected to be done through simple HTML forms. A more convenient (AJAX based) user interface might be implemented in future.
+
+
 Input and output Web services
 =============================
 
&lt;/pre&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Dušan Rychnovský</dc:creator><pubDate>Fri, 13 Apr 2012 17:21:59 -0000</pubDate><guid>https://sourceforge.net423abcd7ef5085f0c11b83cf6fce49f69d77f04d</guid></item><item><title>WikiPage Project description modified by Dušan Rychnovský</title><link>https://sourceforge.net/p/odcleanstore/wiki/Project%2520description/</link><description>&lt;pre&gt;--- v3 
+++ v4 
@@ -4,22 +4,24 @@
 Web frontend
 ============
 
-
+Basic configuration of the whole application will be done through a simple website, based on the Apache Wicket framework.
+
+The website will allow managing user accounts (restricting permissions to use various parts of the website and to insert data through input services), ontologies, configuring (custom) transformers and the engine.
+
 Input and output Web services
 =============================
 
 Engine
 ======
 
 Transformers
 ============
 
 Error locatization and Data normalization
 =========================================
 
 Query execution and Conflict resolution
 =======================================
 
 Object identification
 =====================
-
&lt;/pre&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Dušan Rychnovský</dc:creator><pubDate>Fri, 13 Apr 2012 17:19:40 -0000</pubDate><guid>https://sourceforge.netcab5439e3d9ccb3ecd2598552990fb75f1916664</guid></item><item><title>WikiPage Project description modified by Dušan Rychnovský</title><link>https://sourceforge.net/p/odcleanstore/wiki/Project%2520description/</link><description>&lt;pre&gt;--- v2 
+++ v3 
@@ -1,25 +1,25 @@
 Introduction
 ============
 
+Web frontend
+============
+
+
 Input and output Web services
 =============================
 
-Web frontend
-============
-
 Engine
 ======
 
 Transformers
 ============
 
 Error locatization and Data normalization
 =========================================
 
 Query execution and Conflict resolution
 =======================================
 
 Object identification
 =====================
-
 
&lt;/pre&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Dušan Rychnovský</dc:creator><pubDate>Fri, 13 Apr 2012 17:11:30 -0000</pubDate><guid>https://sourceforge.neta79575b61947fc2a512dd567f4baffbd655e75b6</guid></item></channel></rss>