From: Richard C. <ri...@cy...> - 2006-04-12 15:55:26
|
Hi Markus, Great work you guys are doing with Semantic MediaWiki. Just my two cents: RAP's SPARQL engine is not optimized for accessing =20= database models. It does much heavy lifting in PHP code and I guess =20 it will be rather slow in such an setup. (Disclaimer: I've never =20 actually used it with a DBModel. Tobias, please correct me if I'm =20 getting something wrong.) If you need a high-performance triple store for a PHP app, I think =20 you should evaluate Benjamin Nowack's ARC (there was some talk about =20 integrating this into RAP -- is this still being considered?). =20 There's not much else in native PHP. For really good performance you =20 want an external triple store. If Java is forbidden, that leaves =20 pretty much only 3Store which reputedly is very fast, does cool stuff =20= with SPARQL, and AFAIK has some kind of PHP interface. (This is all just my personal opinion and not backed by actual =20 experience and I'm not a core RAP developer.) Best, Richard On 11 Apr 2006, at 16:40, Markus Kr=F6tzsch wrote: > Hi. > > We consider using RAP as a quadstore for Semantic MediaWiki (see > http://wiki.ontoworld.org). In the long run, we are interested in > inferencing, but for now Wikipedia-size scalability is most =20 > important. Are > there recent evaluations concerning the performance of the =20 > different storage > models? In particular, we are interested in scalability of the =20 > following > functions: > > 1 SPARQL queries: > 1.1 general performance > 1.2 performance of "join-intensive" queries (involving long chains of > triples) > 1.3 performance of datatype queries (e.g. selecting/sorting =20 > results by some > xsd:int or xsd:decimal) > 1.4 performance for partial result lists (e.g. getting only the =20 > first 20) > 2 simple read access (e.g. getting all triples of a certain pattern =20= > or RDF > dataset) > 3 write access > 3.1 adding triples to an existing store > 3.2 deleting selected triples from the store > 4 impact of RDF dataset features/named graph functionality > > For inclusion in Wikipedia, dealing with about 10 Mio triples split =20= > into 1 Mio > RDF datasets is probably necessary. We are working on useful update =20= > and > caching strategies to reduce access to the RDF store, but a rather =20 > high > number of parallel requests still is to be expected (though normal =20 > reading of > articles will not touch the store). It would also be possible to =20 > restrict to > certain types of queries if this leads to improved performance. > > We currently use RAP as an RDF parser for importing ontologies into =20= > Semantic > MediaWiki. For querying our RDF data, we consider reusing an existing > triplestores such as Redland or RAP, but also using SQL queries =20 > directly. > Java toolkits are not an option since Wikipedia requires the use of =20= > free > software (and free Java implementations probably don't support =20 > current RDF > stores). > > I can imagine that one can already find performance measures for =20 > RAP somewhere > on the web -- sorry if I missed this. > > Best regards, > > Markus > > --=20 > Markus Kr=F6tzsch > Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe > ma...@ai... phone +49 (0)721 608 7362 > www.aifb.uni-karlsruhe.de/WBS/ fax +49 (0)721 693 717 |