SPARQL Query Benchmarker Wiki

A simple tool for testing SPARQL system performance and reliability

Brought to you by: gailalverson, rvesse

In-Memory

In-Memory Testing

As of 2.x the API supports in-memory testing as well as testing of remote SPARQL services. In-memory testing is useful when you want to eliminate the HTTP communication overhead from your testing.

To use in-memory testing via the API you first need to specify a dataset for testing via the Options class. For example we can configure a TDB dataset for testing:

Dataset ds = TDBFactory.createDataset("/path/to/db");
options.setDataset(ds);

We then need to ensure our operation mix uses the in-memory variant of relevant operations, for example we should use InMemoryFixedQueryOperation rather than FixedQueryOperation. If you are loading an operation mix from a file then you should use the appropriate operation names e.g. men-query instead of query.

Of course sometimes it is useful to use the same operation mix file for both remote and in-memory testing. In this case we can call the InMemoryOperations.useInMemoryOperations() method before we load our operation file, this reconfigures the OperationLoaderRegistry so that the standard remote query and update operations are remapped to their in-memory equivalents.

Once you have your Options set up appropriately you can then run any of the normal test runners and you'll be testing in-memory rather than remote performance.

Testing at the Command Line

We also support in-memory testing when running from the command line. In-memory datasets are specified via a Jena Assembler file provided using the --dataset option.

This assembler file must contain the definition for a dataset. Note that your assembler file may construct a complex dataset that consists of wrapping/combining different datasets provided there is only one top level dataset. In the case where there are multiple top level datasets which one is used is not defined and you may not be testing the dataset you thought you were. You may also need to add additional dependencies to the CLASSPATH environment variable manually because the JVM will need any additional classes that your assembler file requires that the CLI itself does not use.

Disadvantages

The downside of in-memory testing is that performance figures gathered this way may not be realistic compared to real world usage of a system. In any serious enterprise deployment of a RDF store the store is unlikely to be on the same hardware and be able to accept a direct in-memory connection from the application.

While native APIs may still allow faster communication than HTTP allows typically we've found that once messages become large enough then the HTTP overhead becomes insignificant.