AW: [Rdfapi-php-interest] Scalability and performance

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Markus,

>=20
> We consider using RAP as a quadstore for Semantic MediaWiki (see
> http://wiki.ontoworld.org).=20

Interesting.

> In the long run, we are interested in
> inferencing, but for now Wikipedia-size scalability is most important.

Hmm sorry, up to my knowledge there are no systematic comparisons of the
performance of RAP with other RDF toolkits.

We did some relatively unsystematic performance testing when we =
implemented
different features, but the results are outdated by now.

S=F6ren Auer and Bart Pieterse (both cc'ed) have used RAP in bigger =
projects
and I guess they are the best sources for practical experiences with the
performance of RAP with bigger real world datasets.
=20
My general impression is that as PHP itself is still slower than =
languages
like Java or C, RAP is also slow and its performance can not be compared
with toolkits like Jena or Sesame. S=F6ren might disagree on this point =
with
me.

> Are
> there recent evaluations concerning the performance of the different
> storage
> models? In particular, we are interested in scalability of the =
following
> functions:
>=20
> 1 SPARQL queries:
>  1.1 general performance

Around one second for a medium complex query against a data set with 100 =
000
triples in memory, much slower if the data set is in a database. Tobias
Gauss can give you details.

An PHP alternative for SPARQL queries against data sets which are stored =
in
a database is Benjamin appmosphere toolkit
http://www.appmosphere.com/pages/en-arc.  He does smarter SPARQL to SQL
rewriting than RAP and should theoretically be faster.=20

>  1.2 performance of "join-intensive" queries (involving long chains of
>      triples)
>  1.3 performance of datatype queries (e.g. selecting/sorting results =
by
> some
>      xsd:int or xsd:decimal)
>  1.4 performance for partial result lists (e.g. getting only the first =
20)
> 2 simple read access (e.g. getting all triples of a certain pattern or =
RDF
>   dataset)

OK with models up to 100 000 triples. Don't know about bigger models. =
S=F6ren?

> 3 write access
>  3.1 adding triples to an existing store
>  3.2 deleting selected triples from the store

Should be OK. I think S=F6ren implemented some work arounds for bulk =
updates.=20

> 4 impact of RDF dataset features/named graph functionality

About 5% slower than operations on classic RDF models.

> For inclusion in Wikipedia, dealing with about 10 Mio triples split =
into 1
> Mio
> RDF datasets is probably necessary.=20

Too much for RAP, too much for appmoshere (Benjamin?), and I guess even =
hard
for Jena, Redland and Co if the queries become more complicated.

> We are working on useful update and
> caching strategies to reduce access to the RDF store, but a rather =
high
> number of parallel requests still is to be expected (though normal =
reading
> of
> articles will not touch the store). It would also be possible to =
restrict
> to
> certain types of queries if this leads to improved performance.
>=20
> We currently use RAP as an RDF parser for importing ontologies into
> Semantic
> MediaWiki. For querying our RDF data, we consider reusing an existing
> triplestores such as Redland or RAP, but also using SQL queries =
directly.
> Java toolkits are not an option since Wikipedia requires the use of =
free
> software (and free Java implementations probably don't support current =
RDF
> stores).

If current RDF stores means Named Graph stores then you could use a
combination of Jena and NG4J. Jena is BSD and supports SPARQL. NG4J adds =
a
API for manipulating Named Graph sets. See:
http://www.wiwiss.fu-berlin.de/suhl/bizer/ng4j/

>=20
> I can imagine that one can already find performance measures for RAP
> somewhere
> on the web -- sorry if I missed this.

Not that I know. But all efforts into that direction are highly =
welcomed.

Cheers

Chris

> Best regards,
>=20
> Markus
>=20
> --
> Markus Kr=F6tzsch
> Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe
> ma...@ai...        phone +49 (0)721 608 7362
> www.aifb.uni-karlsruhe.de/WBS/     fax +49 (0)721 693  717

--=20
Chris Bizer
Freie Universit=E4t Berlin
Phone: +49 30 838 54057
Mail: ch...@bi...
Web: www.bizer.de