From: Markus <ma...@ai...> - 2006-04-11 14:41:05
|
Hi. We consider using RAP as a quadstore for Semantic MediaWiki (see=20 http://wiki.ontoworld.org). In the long run, we are interested in=20 inferencing, but for now Wikipedia-size scalability is most important. Are= =20 there recent evaluations concerning the performance of the different storag= e=20 models? In particular, we are interested in scalability of the following=20 functions: 1 SPARQL queries: 1.1 general performance 1.2 performance of "join-intensive" queries (involving long chains of=20 triples) 1.3 performance of datatype queries (e.g. selecting/sorting results by some xsd:int or xsd:decimal) 1.4 performance for partial result lists (e.g. getting only the first 20) 2 simple read access (e.g. getting all triples of a certain pattern or RDF= =20 dataset) 3 write access 3.1 adding triples to an existing store 3.2 deleting selected triples from the store 4 impact of RDF dataset features/named graph functionality =46or inclusion in Wikipedia, dealing with about 10 Mio triples split into = 1 Mio=20 RDF datasets is probably necessary. We are working on useful update and=20 caching strategies to reduce access to the RDF store, but a rather high=20 number of parallel requests still is to be expected (though normal reading = of=20 articles will not touch the store). It would also be possible to restrict t= o=20 certain types of queries if this leads to improved performance. We currently use RAP as an RDF parser for importing ontologies into Semanti= c=20 MediaWiki. For querying our RDF data, we consider reusing an existing=20 triplestores such as Redland or RAP, but also using SQL queries directly.=20 Java toolkits are not an option since Wikipedia requires the use of free=20 software (and free Java implementations probably don't support current RDF= =20 stores).=20 I can imagine that one can already find performance measures for RAP somewh= ere=20 on the web -- sorry if I missed this. Best regards, Markus =2D-=20 Markus Kr=F6tzsch Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe ma...@ai... phone +49 (0)721 608 7362 www.aifb.uni-karlsruhe.de/WBS/ fax +49 (0)721 693 717 |
From: Richard C. <ri...@cy...> - 2006-04-12 15:55:26
|
Hi Markus, Great work you guys are doing with Semantic MediaWiki. Just my two cents: RAP's SPARQL engine is not optimized for accessing =20= database models. It does much heavy lifting in PHP code and I guess =20 it will be rather slow in such an setup. (Disclaimer: I've never =20 actually used it with a DBModel. Tobias, please correct me if I'm =20 getting something wrong.) If you need a high-performance triple store for a PHP app, I think =20 you should evaluate Benjamin Nowack's ARC (there was some talk about =20 integrating this into RAP -- is this still being considered?). =20 There's not much else in native PHP. For really good performance you =20 want an external triple store. If Java is forbidden, that leaves =20 pretty much only 3Store which reputedly is very fast, does cool stuff =20= with SPARQL, and AFAIK has some kind of PHP interface. (This is all just my personal opinion and not backed by actual =20 experience and I'm not a core RAP developer.) Best, Richard On 11 Apr 2006, at 16:40, Markus Kr=F6tzsch wrote: > Hi. > > We consider using RAP as a quadstore for Semantic MediaWiki (see > http://wiki.ontoworld.org). In the long run, we are interested in > inferencing, but for now Wikipedia-size scalability is most =20 > important. Are > there recent evaluations concerning the performance of the =20 > different storage > models? In particular, we are interested in scalability of the =20 > following > functions: > > 1 SPARQL queries: > 1.1 general performance > 1.2 performance of "join-intensive" queries (involving long chains of > triples) > 1.3 performance of datatype queries (e.g. selecting/sorting =20 > results by some > xsd:int or xsd:decimal) > 1.4 performance for partial result lists (e.g. getting only the =20 > first 20) > 2 simple read access (e.g. getting all triples of a certain pattern =20= > or RDF > dataset) > 3 write access > 3.1 adding triples to an existing store > 3.2 deleting selected triples from the store > 4 impact of RDF dataset features/named graph functionality > > For inclusion in Wikipedia, dealing with about 10 Mio triples split =20= > into 1 Mio > RDF datasets is probably necessary. We are working on useful update =20= > and > caching strategies to reduce access to the RDF store, but a rather =20 > high > number of parallel requests still is to be expected (though normal =20 > reading of > articles will not touch the store). It would also be possible to =20 > restrict to > certain types of queries if this leads to improved performance. > > We currently use RAP as an RDF parser for importing ontologies into =20= > Semantic > MediaWiki. For querying our RDF data, we consider reusing an existing > triplestores such as Redland or RAP, but also using SQL queries =20 > directly. > Java toolkits are not an option since Wikipedia requires the use of =20= > free > software (and free Java implementations probably don't support =20 > current RDF > stores). > > I can imagine that one can already find performance measures for =20 > RAP somewhere > on the web -- sorry if I missed this. > > Best regards, > > Markus > > --=20 > Markus Kr=F6tzsch > Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe > ma...@ai... phone +49 (0)721 608 7362 > www.aifb.uni-karlsruhe.de/WBS/ fax +49 (0)721 693 717 |
From: <au...@in...> - 2006-04-12 16:25:19
|
We have used Powl with RAP as a basis with quite large knowledge bases (0.5M triples). My experience is, that performance is determined largely by the underlying database, if you try to encode as much as possible in SQL queries. In Powl we have gone this way and enhanced RAP with quite a lot of API functions triggering quite complex SQL queries. For sure it would be better to use SPARQL instead, but in the time we started to work on Powl SPARQL was not yet available and still I have the impression, there are lots of crucial features missing (e.g. aggregations). From my point of view it would be good if someone could integrate a SPARQL-SQL query rewriting in to RAP (e.g. port Benjamins work). We are actually working on an intelligent caching strategy allowing for selective cache object invalidation on updates and their implementation for RAP. Cheers, Sören |
From: Chris B. <ch...@bi...> - 2006-04-12 15:51:10
|
Hi Markus, >=20 > We consider using RAP as a quadstore for Semantic MediaWiki (see > http://wiki.ontoworld.org).=20 Interesting. > In the long run, we are interested in > inferencing, but for now Wikipedia-size scalability is most important. Hmm sorry, up to my knowledge there are no systematic comparisons of the performance of RAP with other RDF toolkits. We did some relatively unsystematic performance testing when we = implemented different features, but the results are outdated by now. S=F6ren Auer and Bart Pieterse (both cc'ed) have used RAP in bigger = projects and I guess they are the best sources for practical experiences with the performance of RAP with bigger real world datasets. =20 My general impression is that as PHP itself is still slower than = languages like Java or C, RAP is also slow and its performance can not be compared with toolkits like Jena or Sesame. S=F6ren might disagree on this point = with me. > Are > there recent evaluations concerning the performance of the different > storage > models? In particular, we are interested in scalability of the = following > functions: >=20 > 1 SPARQL queries: > 1.1 general performance Around one second for a medium complex query against a data set with 100 = 000 triples in memory, much slower if the data set is in a database. Tobias Gauss can give you details. An PHP alternative for SPARQL queries against data sets which are stored = in a database is Benjamin appmosphere toolkit http://www.appmosphere.com/pages/en-arc. He does smarter SPARQL to SQL rewriting than RAP and should theoretically be faster.=20 > 1.2 performance of "join-intensive" queries (involving long chains of > triples) > 1.3 performance of datatype queries (e.g. selecting/sorting results = by > some > xsd:int or xsd:decimal) > 1.4 performance for partial result lists (e.g. getting only the first = 20) > 2 simple read access (e.g. getting all triples of a certain pattern or = RDF > dataset) OK with models up to 100 000 triples. Don't know about bigger models. = S=F6ren? > 3 write access > 3.1 adding triples to an existing store > 3.2 deleting selected triples from the store Should be OK. I think S=F6ren implemented some work arounds for bulk = updates.=20 > 4 impact of RDF dataset features/named graph functionality About 5% slower than operations on classic RDF models. > For inclusion in Wikipedia, dealing with about 10 Mio triples split = into 1 > Mio > RDF datasets is probably necessary.=20 Too much for RAP, too much for appmoshere (Benjamin?), and I guess even = hard for Jena, Redland and Co if the queries become more complicated. > We are working on useful update and > caching strategies to reduce access to the RDF store, but a rather = high > number of parallel requests still is to be expected (though normal = reading > of > articles will not touch the store). It would also be possible to = restrict > to > certain types of queries if this leads to improved performance. >=20 > We currently use RAP as an RDF parser for importing ontologies into > Semantic > MediaWiki. For querying our RDF data, we consider reusing an existing > triplestores such as Redland or RAP, but also using SQL queries = directly. > Java toolkits are not an option since Wikipedia requires the use of = free > software (and free Java implementations probably don't support current = RDF > stores). If current RDF stores means Named Graph stores then you could use a combination of Jena and NG4J. Jena is BSD and supports SPARQL. NG4J adds = a API for manipulating Named Graph sets. See: http://www.wiwiss.fu-berlin.de/suhl/bizer/ng4j/ >=20 > I can imagine that one can already find performance measures for RAP > somewhere > on the web -- sorry if I missed this. Not that I know. But all efforts into that direction are highly = welcomed. Cheers Chris > Best regards, >=20 > Markus >=20 > -- > Markus Kr=F6tzsch > Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe > ma...@ai... phone +49 (0)721 608 7362 > www.aifb.uni-karlsruhe.de/WBS/ fax +49 (0)721 693 717 --=20 Chris Bizer Freie Universit=E4t Berlin Phone: +49 30 838 54057 Mail: ch...@bi... Web: www.bizer.de |