NanoSparqlServer
From bigdata
NanoSparqlServer provides a light weight REST API for RDF. It is implemented using the Servlet API. You can run NanoSparqlServer from the command line and or embedded within your application using the bundled jetty dependencies. You can also deploy the REST API Servlets into a standard servlet engine.
Contents
|
Deploying NanoSparqlServer
You DO NOT need to deploy the Sesame WAR to run NanoSparqlServer. NanoSparqlServer can be run from the command line (using jetty) embedded (using jetty) or deployed in a servlet container such as Tomcat. By far the easiest way to deploy it is in a servlet container.
Command line (using jetty)
To run the server from the command line (using jetty), you first need to know how your classpath should be set. The bundleJar target of the top-level build.xml file can be invoked to generate a bundle-<version>.jar file to simplify classpath definition. Look in the bigdata-perf directories for examples of ant scripts which do this.
Once you know how to set your classpath you can run the NanoSparqlServer from the command line by executing the class com.bigdata.rdf.sail.webapp.NanoSparqlServer providing the connection port, the namespace and a property file:
java -cp ... -server com.bigdata.rdf.sail.webapp.NanoSparqlServer <port> <namespace> <propertiesFile>
The ... should be your classpath.
The port is just whatever http port you want to run on.
The namespace is the namespace of the triple or quads store instance within bigdata to which you want to connect. If no such namespace exists, a default kb instance is created.
The propertiesFile is where you configure bigdata. You can start with [1] and then edit it to match your requirements. There are a variety of example property files in [2] for quads, triples, inference, provenance, and other interesting variations.
Embedded (using jetty)
The following code example starts a server from code:
import com.bigdata.rdf.sail.webapp.*;
void startServer(final int port, final String namespace, final Properties properties) {
Journal jnl = new Journal(properties);
new LocalTripleStore(jnl, namespace, ITx.UNISOLATED, properties).create();
final BigdataContext.Config config = new BigdataContext.Config();
config.namespace = namespace;
config.port = port;
config.timestamp = ITx.READ_COMMITTED;
JettySparqlServer server = new JettySparqlServer(config.port);
server.startup(config, jnl);
}
Servlet Container (Tomcat, etc)
Download WAR
Download, install, configure a servlet container. See the documentation for your server container as they are all different.
Download [the latest bigdata.war file]. Alternatively you can build the bigdata.war file:
ant clean bundleJar war
This generates ant-build/bigdata.war.
Drop the WAR into the webapps directory of your servlet container and unpack it.
Configuration
Note: It is strongly advised that you unpack the WAR before you start it and edit the RWStore.properties and/or the web.xml deployment descriptor. The web.xml file controls the location of the RWStore.properties file. The RWStore.properties file controls the behavior of the bigdata database instance, the location of the database instance on your disk, and the configuration for the default triple and/or quad store instance that will be created when the webapp starts for the first time. Take a moment to review and edit web.xml and RWStore.properties before you go any further. See GettingStarted if you need help to setup the KB for triples versus quads, enable inference, etc.
Note: As of r6797 and releases after 1.2.2, you can specify the following property to override the location of the bigdata property file:
-Dcom.bigdata.rdf.sail.webapp.ConfigParams.propertyFile=FILE
where FILE is the fully qualified path of the bigdata property file (e.g., RWStore.properties).
Common Startup Problems
The default web.xml and RWStore.properties files use path names which are relative to the directory in which you start the servlet engine. To use the defaults for those files with tomcat you must start tomcat from the 'bin' directory. For example:
cd bin; ./startup.sh
If you have any problems getting the bigdata WAR to start, please consult the servlet log files for detailed information which can help you to localize a configuration error. For Tomcat6 on Ubuntu 10.04 the servlet log is called /var/lib/tomcat6/logs/catalina.out . It may have another name or location in another environment. If you see a permissions error on attempting to open file rules.log then your servlet engine may have been started from the wrong directory.
If you cannot start Tomcat from the 'bin' directory as described above, then you can instead change bigdata's file paths from relative to absolute:
- In webapps/bigdata/RWStore.properties change this line:
com.bigdata.journal.AbstractJournal.file=bigdata.jnl - In webapps/bigdata/WEB-INF/classes/log4j.properties change these three lines:
-
log4j.appender.ruleLog.File=rules.log -
log4j.appender.queryLog.File=queryLog.csv -
log4j.appender.queryRunStateLog.File=queryRunState.log
-
- In webapps/bigdata/WEB-INF/web.xml change this line:
<param-value>../bigdata/RWStore.properties</param-value>
Active URLs
When deployed normally, the following URLs should be active (make sure you use the correct port# for your servlet engine):
- http://localhost:8080/bigdata - help page / console.
- http://localhost:8080/bigdata/sparql - REST API
- http://localhost:8080/bigdata/status - Status page
- http://localhost:8080/bigdata/counters - Performance counters
For example, you can select everything in the database using (this will be an empty result set for a new quad store):
http://localhost:8080/bigdata/sparql?query=select * where { ?s ?p ?o } limit 1
URL encoded this would be:
http://localhost:8080/bigdata/sparql?query=select%20*%20where%20{%20?s%20?p%20?o%20}%20limit%201
Logging
A log4j.properties file is deployed to the WEB-INF/classes directory in the WAR. This will be located automatically during startup. Releases through 1.0.2 will log a warning indicating that the log4j configuration could not be located, but the log4j.properties file is still in effect.
By default, the log4j.properties file will log on the ConsoleAppender. You can edit the log4j.properties file to specify a different appender, e.g., a FileAppender and log file.
Scale-out (cluster / federation)
The NanoSparqlServer will automatically create a KB instance for given namespace if none exists. However, the default KB configuration is not appropriate for scale-out. In order to create a KB instance which is appropriate for scale-out you need to override the properties object which will be seen by the NanoSparqlServer (actually, by the BigdataRDFServletContext). You can do this by editing "com.bigdata.service.jini.JiniClient" component block in the configuration file. The line that you want to change is:
old:
// properties = new NV[] {};
new:
properties = lubm.properties;
This will direct the NanoSparqlServer to use the configuration for the KB instance described the the "lubm" component in the file, which gives a KB configuration which is appropriate for the LUBM benchmark. You can then modify the "lubm" component to reflect your use case, e.g., triples versus quads, etc.
To setup for quads, change the following lines in the "lubm" configuration block:
old:
static private namespace = "U"+univNum+"";
new:
static private namespace = "PUT-YOUR_NAMESPACE_HERE"; // Note: This MUST be the same value you will specify to the NanoSparqlServer.
old:
//new NV(BigdataSail.Options.AXIOMS_CLASS, "com.bigdata.rdf.axioms.RdfsAxioms"),
new:
new NV(BigdataSail.Options.AXIOMS_CLASS,"com.bigdata.rdf.axioms.NoAxioms"),
new:
new NV(BigdataSail.Options.QUADS_MODE,"true"),
old:
new NV(BigdataSail.Options.FORWARD_CHAIN_OWL_INVERSE_OF, "true"),
new NV(BigdataSail.Options.FORWARD_CHAIN_OWL_TRANSITIVE_PROPERTY, "true"),
new:
// new NV(BigdataSail.Options.FORWARD_CHAIN_OWL_INVERSE_OF, "true"),
// new NV(BigdataSail.Options.FORWARD_CHAIN_OWL_TRANSITIVE_PROPERTY, "true"),
Note that you have to specify the namespace both in the configuration file and on the command line and to the NanoSparqlServer since the configuration file is parametrized to override various indices based on the namespace.
Start the NanoSparqlServer using nanoSparqlServer.sh. You need to specify the port and the default KB namespace on the command line:
nanoSparqlServer.sh port namespace
The NanoSparqlServer will echo the serviceURL to the console. The actual URL depends on your installation, but it will be something like this:
serviceURL: http://192.168.1.10:8090
The "serviceURL" is actually the URI of the NanoSparqlServer web application. You can interact directly with the web application. If you want to use the SPARQL end point, you need to append "/sparql" to that URL. For example:
serviceURL: http://192.168.1.10:8090/sparql
Note: By default, the nanoSparqlServer.sh script will assert a read lock for the lastCommitTime on the federation. This removes the need to obtain a transaction per query on a cluster. See the script file for more information.
Issues:
- log4j configuration complaints.
- reload of the webapp causes complaints.
- refer people to JVM settings for decent performance.
REST API
SPARQL End Point
By default, the NanoSparqlServer will respond at the following URL when run in its embedded mode:
http://localhost:80/sparql
And at this URL when deployed in a servlet container (bigdata is the name of the deployed webapp).
http://localhost:80/bigdata/sparql
The baseURI for the NanoSparqlServer is the effective service end point URL.
MIME Types
In general, requests may use any of the known MIME types. Likewise, you can CONNEG for any of these MIME types. However, CONNEG may not be very robust. Therefore, when seeking a specific MIME type for a response, it is best to specify an Accept header which specifies just the desired MIME type.
RDF data
These data are based on the org.openrdf.rio.RDFFormat declarations. The set of understood formats is extensible Additional declarations MAY be registered with the openrdf platform and associated with parsers and writers for that RDFFormat. The recommended charset, file name extension, etc. are always as declared by the IANA MIME type registration. Note that a potential for confusion exists with the ".xml" MIME Type and its use with this API is not recommended.
| MIME Type | File extension | Charset | Name | URL | Comments |
|---|---|---|---|---|---|
| application/rdf+xml | .rdf, .rdfs, .owl, .xml | UTF-8 | RDF/XML | http://www.w3.org/TR/REC-rdf-syntax/ | |
| text/plain | .nt | US-ASCII | N-Triples | http://www.w3.org/TR/rdf-testcases/#ntriples | N-Triples defines an escape encoding for non-ASCII characters. |
| application/x-turtle | .ttl | UTF-8 | Turtle | http://www.w3.org/TeamSubmission/turtle/ | |
| text/rdf+n3 | .n3 | UTF-8 | N3 | http://www.w3.org/TeamSubmission/n3/ | |
| application/trix | .trix | UTF-8 | TriX | http://www.hpl.hp.com/techreports/2003/HPL-2003-268.html | |
| application/x-trig | .trig | UTF-8 | TRIG | http://www.wiwiss.fu-berlin.de/suhl/bizer/TriG/Spec | |
| text/x-nquads | .nq | US-ASCII | NQUADS | http://sw.deri.org/2008/07/n-quads/ | While the REST API can accept NQuads data, it can not generate it yet. |
SPARQL Result Sets
| MIME Type | Name | URL | Comments |
|---|---|---|---|
| application/sparql-results+xml | SPARQL Query Results XML Format | http://www.w3.org/TR/rdf-sparql-XMLres/ | |
| application/sparql-results+json | SPARQL Query Results JSON Format | http://www.w3.org/TR/rdf-sparql-json-res/ | |
| application/x-binary-rdf-results-table | Binary Query Results Format | http://www.openrdf.org/doc/sesame2/api/org/openrdf/query/resultio/binary/BinaryQueryResultConstants.html | This is a format defined by the openrdf platform. |
| text/tab-separated-values | Tab Separated Values (TSV) | http://www.w3.org/TR/sparql11-results-csv-tsv/ | |
| text/csv | Comma Separated Values (CSV) | http://www.w3.org/TR/sparql11-results-csv-tsv/ |
Property set data
The Multi-Tenancy API interchanges property set data. The MIME types understood by the API are:
| MIME Type | File extension | Charset | Name | URL | Comments |
|---|---|---|---|---|---|
| application/xml | .xml | UTF-8 | |||
| text/plain | .properties | UTF-8 |
Mutation Result
Operations which cause a mutation will report an XML document having the general structure:
<data modified="5" milliseconds="112"/>
Where modified is the mutation count.
Where milliseconds is the elapsed time for the operation.
API Atomicity
Queries use snapshot isolation.
Mutation operations are ACID against a standalone database and shard-wise ACID against a bigdata federation.
QUERY
GET or POST
GET Request-URI ?query=... -OR- POST Request-URI ?query=...
The response body is the result of the query. The following query parameters are understood:
| parameter | definition |
|---|---|
| timestamp | A timestamp corresponding to a commit time against which the query will read. |
| explain | The query will be run, but the response will be an HTML document containing an "explanation" of the query. The response currently includes the original SPARQL query, the operator tree obtained by parsing that query, and detailed metrics from the evaluation of the query. This information may be used to examine opportunities for query optimization. |
| analytic | This enables the AnalyticQuery mode. |
| default-graph-uri | Specify zero or more graphs whose RDF merge is the default graph for this query (protocol option with the same semantics as FROM). |
| named-graph-uri | Specify zero or more named graphs for this query (protocol option with the same semantics as FROM NAMED). |
FAST RANGE COUNTS
Bigdata uses fast range counts internally for its query optimizer. Fast range counts on an access path are computed with two key probes against appropriate index. Fast range counts are appropriate for federated query engines where they provide more information than an "ASK" query for a triple pattern. Fast range counts are also exact range counts under some common deployment configurations.
Fast range counts are fast. They use two key probes to find the ordinal index of the from and to key for the access path and then report (toIndex-fromIndex). This is orders of magnitude faster than you can achieve in SPARQL using a construction like "SELECT COUNT (*) { ?s ?p ?o }" because the corresponding SPARQL query must actually visit each tuple in that key range, rather than just reporting how many tuples there are.
Fast range counts are exact when running against a BigdataSail on a local journal which has been provisioned without full read/write transactions. When full read/write transactions are enabled, the fast range counts will also report the "delete markers" in the index. In scale-out, the fast range counts are also approximate if the key range spans more than one shard (in which case you are talking about lot of data).
Note: This method is available in releases after version 1.0.2.
GET Request-URI ?ESTCARD&([s|p|o|c]=(uri|literal))+
The response is an XML document having the general structure:
<data rangeCount="5" milliseconds="12"/>
Where rangeCount is the mutation count.
Where milliseconds is the elapsed time for the operation.
INSERT
INSERT RDF (POST with Body)
POST Request-URI ... Content-Type: ... BODY
Perform an HTTP-POST, which corresponds to the basic CRUD operation "create" according to the generic interaction semantics of HTTP REST.
Where BODY is the new RDF content using the representation indicated by the Content-Type.
You can also specify a context-uri request parameter which sets the default context when triples data are loaded into a quads store (available in releases after 1.0.2).
INSERT RDF (POST with URLs)
POST Request-URI ?uri=URI
Where URI identifies a resource whose RDF content will be inserted into the database. The uri query parameter may occur multiple times. All identified resources will be loaded in a single operation. See [3] for the mime types understood by this operation.
You can also specify a context-uri request parameter which sets the default context when triples data are loaded into a quads store (available in releases after 1.0.2).
DELETE
DELETE with Query
DELETE Request-URI ?query=...
Where query is a CONSTRUCT or DESCRIBE query.
Note: To avoid materializing the statements, this runs the query against the last commit time. This is done while it is holding the unisolated connection which prevents concurrent modifications. Therefore the entire QUERY + DELETE operation is ACID.
DELETE with Body (using POST)
POST Request-URI ?delete ... Content-Type ... BODY
This is a POST because many APIs do not allow a BODY with a DELETE verb. The BODY contains RDF statements according to the specified Content-Type. Statements parsed from the BODY are deleted.
DELETE with Access Path
Note: This method is available in releases after version 1.0.2.
DELETE Request-URI ?([s|p|o|c]=(uri|literal))+
All statements matching the bound values of the subject (s), predicate (p), object (o), and/or context (c) position will be deleted from the database. Each position may be specified at most once, but more than one position may be specified. For example:
So, a DELETE of everything for a given context would be:
DELETE Request-URI ?c=<http://example.org/foo>
And a DELETE of everything for some subject and predicate would be:
DELETE Request-URI ?s=<http://example.org/s1>&p=<http://www.example.org/p1>
And to DELETE everything having some object value:
DELETE Request-URI ?o="abc"
or
DELETE Request-URI ?o="5"^^<datatypeUri>
And to delete everything at that end point:
DELETE Request-URI
UPDATE (SPARQL 1.1 UPDATE)
POST Request-URI ?update=...
| parameter | definition |
|---|---|
| using-graph-uri | Specify zero or more graphs whose RDF merge is the default graph for the update request (protocol option with the same semantics as USING). |
| using-named-graph-uri | Specify zero or more named graphs for this the update request (protocol option with the same semantics as USING NAMED). |
See SPARQL 1.1 Protocol.
Note: This method is available in releases after version 1.1.0.
UPDATE (DELETE + INSERT)
UPDATE (DELETE statements selected by a QUERY plus INSERT statements from Request Body using PUT)
PUT Request-URI ?query=... ... Content-Type ... BODY
Where query is a CONSTRUCT or DESCRIBE query.
Note: To avoid materializing the statements, this runs the query against the last commit time. This is done while it is holding the unisolated connection which prevents concurrent modifications. Therefore the entire QUERY + DELETE operation is ACID.
Note: You MAY specify a CONSTRUCT query with an empty WHERE clause in order to specify a set of statements to be removed without reference to statements already existing in the database. For example:
CONSTRUCT { bd:Bryan bd:likes bd:RDFS } { }
Note the trailing "{ }" which is the empty WHERE clause. This makes it possible to delete arbitrary statements followed by the insert of arbitrary statements.
| parameter | definition |
|---|---|
| context-uri | Request parameter which sets the default context when triples data are loaded into a quads store (available in releases after 1.0.2). |
UPDATE (POST with Multi-Part Request Body)
POST Request-URI ?updatePost ... Content-Type: multipart/form-data; boundary=... ... form-data; name="remove"; type="Content-Type" Content-Body ... form-data; name="add"; type="Content-Type" Content-Body ... BODY
You can specify to sets of serialized statements - one to be removed and one to be added. This operation will be ACID on the server.
| parameter | definition |
|---|---|
| context-uri | Request parameter which sets the default context when triples data are loaded into a quads store (available in releases after 1.0.2). |
STATUS
GET /status
Various information about the SPARQL end point. URL Query parameters include:
| parameter | definition |
|---|---|
| showQueries(=details) | Show information on all queries currently executing on the NanoSparqlServer. The queries will be arranged in descending order by their elapsed evaluation time. When the value of this query parameter is "details", the response will include the query evaluation metrics for each bop (bigdata operator) in the query. Otherwise only the query evaluation metrics for the top-level query bop in the query plan will be included. In either case, the reported metrics are updated each time the page is refreshed so it is possible to track the progress of a long running query in this manner. |
| queryId=UUID | Request information only for the specified query(s). This parameter may appear zero or more times. (Since bigdata 1.1). |
CANCEL
POST /?cancel&queryId=....
Cancel one or more running query(s). Queries which are still running when the request is processed will be cancelled. (Since bigdata 1.1. Prior to bigdata 1.2, this method was available at /status. The preferred URI for this method is now /, which is the URI of the SPARQL end point. The other URI is deprecated for this method.)
| parameter | definition |
|---|---|
| queryId=UUID | The UUID of a running query. |
Multi-Tenancy API
The Multi-Tenancy API allows you to administer and access multiple triple or quad store instances in a single backing Journal or Federation. Each triple or quad store instance has a unique namespace and corresponds to the concept of a VoID Dataset. A brief VoID description is used to describe the known data sets. A detailed VoID description is included in the Service Description of a data set. The default data set is associated with the namespace "kb" (unless you override that on the NanoSparqlServer command line). The SPARQL end point for a data set may be used to obtain a detailed Service Description of that data set (including VoID metadata and statistics), to issue SPARQL 1.1 Query and Update requests, etc. That end point is:
/namespace/NAMESPACE/sparql
where NAMESPACE is the namespace of the desired data set.
This feature is available in bigdata releases after 1.2.2.
DESCRIBE DATA SETS
GET /namespace
Obtain a brief VoID description of the known data sets. The description includes the namespace of the data set and its sparql end point. A more detailed service description is available from the sparql end point. The response to this request MAY be cached.
CREATE DATA SET
POST /namespace ... Content-Type ... BODY
Create a new data set. The data set is configured based on the inherited configuration properties as overridden by the properties specified in the request entity (aka the BODY). The Content-Type must be one of those recognized for Java properties (the supported MIME Types are specified at NanoSparqlServer#Property_set_data).
You MUST specify at least the following property in order to create a non-default data set:
com.bigdata.rdf.sail.namespace=NAMESPACE
where NAMESPACE is the name of the new data set.
See the javadoc for the BigdataSail and AbstractTripleStore for other configuration options. Also see the sample property files in bigdata-sails/src/samples.
Note: You can not reconfigure the Journal or Federation using this method. The properties will only be applied to the newly created data set. This method does NOT create a new backing Journal, it just creates a new data set on the same Journal (or on the same Federation when running on a cluster).
LIST PROPERTIES
GET /namespace/NAMESPACE/properties
Obtain a list of the effective configuration properties for the data set named NAMESPACE.
DESTROY DATA SET
DELETE /namespace/NAMESPACE
Destroy the data set identified by NAMESPACE.
Java Client API
We have added a Java API for clients to the NanoSparqlServer. The main REST API is contained in the class:
com.bigdata.rdf.sail.webapp.client.RemoteRepository
And the test case "TestNanoSparqlClient" demonstrates how to use the API.
The Multi-Tenancy API is contained in the class:
com.bigdata.rdf.sail.webapp.client.RemoteRepositoryManager
Query Optimization
There are several ways to get information about running query evaluation plans.
- The #STATUS page has a showQueries=(details) option which provides in depth information about the SPARQL query, Abstract Syntax Tree, bigdata operators (bops) and running statistics on current queries.
- The #QUERY ?explain parameter may be used with a query to report essentially the same information as the #STATUS page in an HTML response.
Performance Optimization resources
- There is a also good write up on query performance optimization on the blog [4].
- There is a section on performance optimization for bigdata on the wiki PerformanceOptimization.
- Bigdata supports a variety of query hints through both the SAIL and the NanoSparqlServer interfaces. See [5] for more details.
- Bigdata supports query hints using magic triples (since 1.1.0). See QueryHints.
