Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.txt | 2016-07-25 | 7.5 kB | |
workspace-prov-dist-20160628.zip | 2016-06-28 | 27.4 MB | |
Totals: 2 Items | 27.4 MB | 0 |
ThoughtFlow Server ==================== Version: 1.0 (Alpha) Dated: 25/07/2016 Sources ======== Check out the Java code from this GIT repository. Download and unzip the file workspace-prov-dist-XXXXXXX.zip Overview ========= Installation and running Unit Tests. The protocol to run the RDF-based provenance datastore is as follows. 1. Starting JENA and creating a data set. 2. Importing provenance data into JENA. 3. Confirming data import via the JENA query terminal. 4. Notes on JENA shutdown. PROTOCOL ======== N/B These instructions refer to installation of the provenance database under Windows. The Linux installation is essentially the same. ==================================================================== Part 1: STARTING JENA ==================================================================== 1. Create a root directory for the RDF triple store data files. For Windows the default path is 'C:\etc\fuseki'. For Linux the default path is '/etc/fuseki'. Make sure the user ID running the JENA server has 'write' permission to that directory. 2. In the Provenance Database distribution directory (~/workspace-prov-dist) start JENA. For Windows, click on the file '~workspace-prov-dist/apache-tomcat-7.0.67/bin/startup.bat'. For Linux, run the shell script '~workspace-prov-dist/apache-tomcat-7.0.67/bin/startup.sh'. A server console window appears displaying a set of 'boot' messages. The last message server should be something like 'INFO: Server startup in XXXX ms' where XXXX is the boot time. Start up takes 10-30 seconds depending on spec of the host machine. If it takes longer, something went wrong. 3. Confirm Fuseki (JENA RDF server) is running by going to the following web site. http://localhost:8080/fuseki/ You should see a 'White' Fuseki RDF management page. 4. To save RDF/JSON in Fuseki, you must create a 'dataset' to save the triples of a graph. From http://localhost:8080/fuseki/ , click on the link "There are no datasets on this server yet. Add one". That will navigate you to the Fuseki Dataset Manager:- http://localhost:8080/fuseki/manage.html?tab=new-dataset 5. From http://localhost:8080/fuseki/manage.html?tab=new-dataset , click on the 'add new dataset' tab. Click on the 'Dataset name' text box and enter 'ds' as the dataset name. Select the 'Persitent' as the Dataset type option. Then click the 'create dataset'. 6. A dataset called 'ds' appears in the Fuseki dataset list:- http://localhost:8080/fuseki/manage.html?tab=datasets ==================================================================== Part 2: Importing provenance ==================================================================== To import Provenance JSON into requires a Java 7 installation, Eclipse and Maven. Checkout the Java project code from the "Though Flow" server repository. Contact your system about the installation detail. The Open Provenance Toolbox used in this codebase is 0.7.2. The toolbox is available by Maven Central but we have had problems with Maven hosting of those libraries in the recent past. The 'relevant' Open Provenance Toolbox modules are included with this distribution. The Maven POMS are modified to make the build simpler but the Java code is unmodified. The source code for Open Provenance Toolbox (OPT) 0.7.3 is available from GIT hub but that project is broke. OPT 0.7.2 is stable, OPT 0.7.3 is not stable. To import data into the JENA server:- 1. While the JENA server is running import the following projects into your Eclipse workspace. The recommendation is to create a 'fresh' Eclipse workspace pre-configured with a Java 7 run-time. The Maven modules are hosted in the distribution directory when the code was checked out from the ThoughFlow GIT repository. Import the following projects as 'Existing Maven projects' via the Eclipse Import Wizard. prov-dot prov-generator prov-interop prov-json prov-model prov-n prov-rdf prov-template prov-xml thought-flow-server 2. The project Java dependencies should be automatically picked by Maven. If you are not connected to the Internet, the project import will fail. Maven needs to download JARS from a central repository for the project to build. The import can take a minute or 2, depending on the power of the host computer. 3. Start the provenance JSON import by Starting the following Java Application in the "thought-flow-server" project:- eu.ddmore.provn.store.Application This runs an embedded HTTP server, which interfaces to the JENA server for JSON Document import. 4. To import provenance data into JENA, click on the project 'thought-flow-server'. Use the Eclipse Project Manager to open the Java Class:- eu.ddmore.provn.client.jsontests20160725.AllTests This Java class is a JUNit test suite. Run the class 'AllTests' to save the current set of reference JSON documents to JENA. Java may pause for 10 seconds as it loads but each update takes a fraction of a second. The time lag is there as the JENA client is connecting to the RDF servers. 5. If the JUNit terminal is displaying green bars, the import was successful. Currently, a lot of "Spring Status" messages are written to the Eclipse console as the update progresses. 6. Import only happens once for a graph. Repeated import do does not result in duplicated graphs. Existing graphs are only modified by UPDATE/DELETE SparQL calls. ==================================================================== Part 3: Importing provenance ==================================================================== If parts 1 and 2 were successful, the JENA server now has ~400 triples of Provenance data. To confirm that, you must query the JENA server to check for the prescence of triples. 1. Go to this web page :- http://localhost:8080/fuseki/dataset.html 2. The query terminal displays a default SparQL query:- SELECT ?subject ?predicate ?object WHERE { ?subject ?predicate ?object } LIMIT 25 3. Click on the black arrow in the top right quadrant of the terminal to launch the query. 4. The query terminal will display the an unstructured table of raw triple data. The returned triples are raw and unstructured. SparQL queries forms the structure for returned data. Modify the LIMIT number to increase the returned number of rows. If you specify too large a number, you will kill your web browser as it tries to display 1000's of data items. ==================================================================== Part 4: JENA shutdown ==================================================================== To shutdown JENA, click on the following file in the distribution directory. ~workspace-prov-dist/apache-tomcat-7.0.67/bin/shutdown.bat Sometimes server at shutdown does not stop cleanly. To force a shutdown, close the Server Window as you would any other application window This can leave an open JENA lock file in the RDF data store directory (C:/etc/fuseki). If those files are present post a JENA shutdown, delete those files in order to re-start JENA. If lock files are present in the JENA data directory prior server start up, the JENA server will not start correctly.