Re: [Rdfapi-php-interest] Scalability and performance

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Markus,

Great work you guys are doing with Semantic MediaWiki.

Just my two cents: RAP's SPARQL engine is not optimized for accessing =20=

database models. It does much heavy lifting in PHP code and I guess =20
it will be rather slow in such an setup. (Disclaimer: I've never =20
actually used it with a DBModel. Tobias, please correct me if I'm =20
getting something wrong.)

If you need a high-performance triple store for a PHP app, I think =20
you should evaluate Benjamin Nowack's ARC (there was some talk about =20
integrating this into RAP -- is this still being considered?). =20
There's not much else in native PHP. For really good performance you =20
want an external triple store. If Java is forbidden, that leaves =20
pretty much only 3Store which reputedly is very fast, does cool stuff =20=

with SPARQL, and AFAIK has some kind of PHP interface.

(This is all just my personal opinion and not backed by actual =20
experience and I'm not a core RAP developer.)

Best,
Richard

On 11 Apr 2006, at 16:40, Markus Kr=F6tzsch wrote:

> Hi.
>
> We consider using RAP as a quadstore for Semantic MediaWiki (see
> http://wiki.ontoworld.org). In the long run, we are interested in
> inferencing, but for now Wikipedia-size scalability is most =20
> important. Are
> there recent evaluations concerning the performance of the =20
> different storage
> models? In particular, we are interested in scalability of the =20
> following
> functions:
>
> 1 SPARQL queries:
>  1.1 general performance
>  1.2 performance of "join-intensive" queries (involving long chains of
>      triples)
>  1.3 performance of datatype queries (e.g. selecting/sorting =20
> results by some
>      xsd:int or xsd:decimal)
>  1.4 performance for partial result lists (e.g. getting only the =20
> first 20)
> 2 simple read access (e.g. getting all triples of a certain pattern =20=

> or RDF
>   dataset)
> 3 write access
>  3.1 adding triples to an existing store
>  3.2 deleting selected triples from the store
> 4 impact of RDF dataset features/named graph functionality
>
> For inclusion in Wikipedia, dealing with about 10 Mio triples split =20=

> into 1 Mio
> RDF datasets is probably necessary. We are working on useful update =20=

> and
> caching strategies to reduce access to the RDF store, but a rather =20
> high
> number of parallel requests still is to be expected (though normal =20
> reading of
> articles will not touch the store). It would also be possible to =20
> restrict to
> certain types of queries if this leads to improved performance.
>
> We currently use RAP as an RDF parser for importing ontologies into =20=

> Semantic
> MediaWiki. For querying our RDF data, we consider reusing an existing
> triplestores such as Redland or RAP, but also using SQL queries =20
> directly.
> Java toolkits are not an option since Wikipedia requires the use of =20=

> free
> software (and free Java implementations probably don't support =20
> current RDF
> stores).
>
> I can imagine that one can already find performance measures for =20
> RAP somewhere
> on the web -- sorry if I missed this.
>
> Best regards,
>
> Markus
>
> --=20
> Markus Kr=F6tzsch
> Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe
> ma...@ai...        phone +49 (0)721 608 7362
> www.aifb.uni-karlsruhe.de/WBS/     fax +49 (0)721 693  717