RE: [Rdfapi-php-interest] Scalability and performance

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

-------- Original Message --------
> From: Chris Bizer <>
> Date: 12 April 2006 16:50
>=20
> Hi Markus,
>=20
> >=20
> > We consider using RAP as a quadstore for Semantic MediaWiki (see
> > http://wiki.ontoworld.org).
>=20
> Interesting.
>=20
> > In the long run, we are interested in
> > inferencing, but for now Wikipedia-size scalability is most =
important.
>=20
> Hmm sorry, up to my knowledge there are no systematic comparisons of
> the performance of RAP with other RDF toolkits.=20
>=20
> We did some relatively unsystematic performance testing when we
> implemented different features, but the results are outdated by now.=20
>=20
> S=F6ren Auer and Bart Pieterse (both cc'ed) have used RAP in bigger
> projects and I guess they are the best sources for practical
> experiences with the performance of RAP with bigger real world
> datasets.  =20
>=20
> My general impression is that as PHP itself is still slower than
> languages like Java or C, RAP is also slow and its performance can not
> be compared with toolkits like Jena or Sesame. S=F6ren might disagree =
on
> this point with me.  =20
>=20
> > Are
> > there recent evaluations concerning the performance of the different
> > storage models? In particular, we are interested in scalability of
> > the following functions:
> >=20
> > 1 SPARQL queries:
> >  1.1 general performance
>=20
> Around one second for a medium complex query against a data set with
> 100 000 triples in memory, much slower if the data set is in a
> database. Tobias Gauss can give you details. =20
>=20
> An PHP alternative for SPARQL queries against data sets which are
> stored in a database is Benjamin appmosphere toolkit
> http://www.appmosphere.com/pages/en-arc.  He does smarter SPARQL to =
SQL
> rewriting than RAP and should theoretically be faster.  =20
>=20
> >  1.2 performance of "join-intensive" queries (involving long chains
> > of      triples)=20
> >  1.3 performance of datatype queries (e.g. selecting/sorting results
> > by some
> >      xsd:int or xsd:decimal)
> >  1.4 performance for partial result lists (e.g. getting only the
> > first 20) 2 simple read access (e.g. getting all triples of a =
certain
> >   pattern or RDF dataset)
>=20
> OK with models up to 100 000 triples. Don't know about bigger models.
> S=F6ren?=20
>=20
> > 3 write access
> >  3.1 adding triples to an existing store
> >  3.2 deleting selected triples from the store
>=20
> Should be OK. I think S=F6ren implemented some work arounds for bulk
> updates.=20
>=20
> > 4 impact of RDF dataset features/named graph functionality
>=20
> About 5% slower than operations on classic RDF models.
>=20
> > For inclusion in Wikipedia, dealing with about 10 Mio triples split
> > into 1 Mio RDF datasets is probably necessary.
>=20
> Too much for RAP, too much for appmoshere (Benjamin?), and I guess =
even
> hard for Jena, Redland and Co if the queries become more complicated.=20

That would be surprising.  Although it depends on the query greatly, =
graphs upto 100e6 should be no problem.  They take a while to load =
though :-)

http://esw.w3.org/topic/LargeTripleStores
http://esw.w3.org/topic/LargeQuadStores

We're using a 10e6 triple graph for regular testing at the moment with a =
new database store for Jena and we use the same data with the existing =
Jena database solution.  (The choice of 10e6 is based on big enough to =
show effects of scale but small enough to be manageable as we keep =
reloading due to schema experimentation.)  Real testing is on 100e6 =
triples.

Support for interactive use is trickier especially if the queries are =
arbitrary as its possible to write queries that are always going to have =
large intermediate results.  If the app does not allow the user to =
(indirectly) write an arbirary query, then a little care with queries =
should make interactive use possible on data up to 100e6 and beyond.  =
Steve Harris (3Store) has a lot of experience with this.

At 1e6 triples, it is more a matter of running in memory (if you can =
afford the system resources).

Is there a SPARQL protocol driver for RAP?  If so, the database can be =
any RDF system you want, and the RAP application can issue requests over =
the SPARQL protocol.

>=20
> > We are working on useful update and
> > caching strategies to reduce access to the RDF store, but a rather
> > high number of parallel requests still is to be expected (though
> > normal reading of articles will not touch the store). It would also =
be
> > possible to restrict to certain types of queries if this leads to
> > improved performance.=20
> >=20
> > We currently use RAP as an RDF parser for importing ontologies into
> > Semantic MediaWiki. For querying our RDF data, we consider reusing =
an
> > existing triplestores such as Redland or RAP, but also using SQL
> > queries directly.
> > Java toolkits are not an option since Wikipedia requires the use of
> > free software (and free Java implementations probably don't support
> > current RDF stores).

Jena runs with IKVM and GNUClasspath.  Also runs on .Net and Mono via =
IKVM.

My experiences with the current IKVM have been very good and they would =
suggest that most RDF/Java toolkits will run quite adequately these =
days.  It wasn't true awhile ago but things have moved on rapidly =
recently.

	Hope that helps,
	Andy

>=20
> If current RDF stores means Named Graph stores then you could use a
> combination of Jena and NG4J. Jena is BSD and supports SPARQL. NG4J
> adds a API for manipulating Named Graph sets. See: =20
> http://www.wiwiss.fu-berlin.de/suhl/bizer/ng4j/
>=20
> >=20
> > I can imagine that one can already find performance measures for RAP
> > somewhere on the web -- sorry if I missed this.
>=20
> Not that I know. But all efforts into that direction are highly
> welcomed.=20
>=20
> Cheers
>=20
> Chris
>=20
>=20
> > Best regards,
> >=20
> > Markus
> >=20
> > --
> > Markus Kr=F6tzsch
> > Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe
> > ma...@ai...        phone +49 (0)721 608 7362
> > www.aifb.uni-karlsruhe.de/WBS/     fax +49 (0)721 693  717
>=20
>=20
> --
> Chris Bizer
> Freie Universit=E4t Berlin
> Phone: +49 30 838 54057
> Mail: ch...@bi...
> Web: www.bizer.de
>=20
>=20
>=20
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting
> language=20
> that extends applications into web and mobile media. Attend the live
> webcast=20
> and join the prime developer group breaking into this new coding
> territory!=20
> http://sel.as-us.falkag.net/sel?cmd=3Dk&kid=110944&bid$1720&dat=121642
> _______________________________________________
> Rdfapi-php-interest mailing list
> Rdf...@li...
> https://lists.sourceforge.net/lists/listinfo/rdfapi-php-interest