From: Seaborne, A. <and...@hp...> - 2006-04-13 13:04:31
|
-------- Original Message -------- > From: Chris Bizer <> > Date: 12 April 2006 16:50 >=20 > Hi Markus, >=20 > >=20 > > We consider using RAP as a quadstore for Semantic MediaWiki (see > > http://wiki.ontoworld.org). >=20 > Interesting. >=20 > > In the long run, we are interested in > > inferencing, but for now Wikipedia-size scalability is most = important. >=20 > Hmm sorry, up to my knowledge there are no systematic comparisons of > the performance of RAP with other RDF toolkits.=20 >=20 > We did some relatively unsystematic performance testing when we > implemented different features, but the results are outdated by now.=20 >=20 > S=F6ren Auer and Bart Pieterse (both cc'ed) have used RAP in bigger > projects and I guess they are the best sources for practical > experiences with the performance of RAP with bigger real world > datasets. =20 >=20 > My general impression is that as PHP itself is still slower than > languages like Java or C, RAP is also slow and its performance can not > be compared with toolkits like Jena or Sesame. S=F6ren might disagree = on > this point with me. =20 >=20 > > Are > > there recent evaluations concerning the performance of the different > > storage models? In particular, we are interested in scalability of > > the following functions: > >=20 > > 1 SPARQL queries: > > 1.1 general performance >=20 > Around one second for a medium complex query against a data set with > 100 000 triples in memory, much slower if the data set is in a > database. Tobias Gauss can give you details. =20 >=20 > An PHP alternative for SPARQL queries against data sets which are > stored in a database is Benjamin appmosphere toolkit > http://www.appmosphere.com/pages/en-arc. He does smarter SPARQL to = SQL > rewriting than RAP and should theoretically be faster. =20 >=20 > > 1.2 performance of "join-intensive" queries (involving long chains > > of triples)=20 > > 1.3 performance of datatype queries (e.g. selecting/sorting results > > by some > > xsd:int or xsd:decimal) > > 1.4 performance for partial result lists (e.g. getting only the > > first 20) 2 simple read access (e.g. getting all triples of a = certain > > pattern or RDF dataset) >=20 > OK with models up to 100 000 triples. Don't know about bigger models. > S=F6ren?=20 >=20 > > 3 write access > > 3.1 adding triples to an existing store > > 3.2 deleting selected triples from the store >=20 > Should be OK. I think S=F6ren implemented some work arounds for bulk > updates.=20 >=20 > > 4 impact of RDF dataset features/named graph functionality >=20 > About 5% slower than operations on classic RDF models. >=20 > > For inclusion in Wikipedia, dealing with about 10 Mio triples split > > into 1 Mio RDF datasets is probably necessary. >=20 > Too much for RAP, too much for appmoshere (Benjamin?), and I guess = even > hard for Jena, Redland and Co if the queries become more complicated.=20 That would be surprising. Although it depends on the query greatly, = graphs upto 100e6 should be no problem. They take a while to load = though :-) http://esw.w3.org/topic/LargeTripleStores http://esw.w3.org/topic/LargeQuadStores We're using a 10e6 triple graph for regular testing at the moment with a = new database store for Jena and we use the same data with the existing = Jena database solution. (The choice of 10e6 is based on big enough to = show effects of scale but small enough to be manageable as we keep = reloading due to schema experimentation.) Real testing is on 100e6 = triples. Support for interactive use is trickier especially if the queries are = arbitrary as its possible to write queries that are always going to have = large intermediate results. If the app does not allow the user to = (indirectly) write an arbirary query, then a little care with queries = should make interactive use possible on data up to 100e6 and beyond. = Steve Harris (3Store) has a lot of experience with this. At 1e6 triples, it is more a matter of running in memory (if you can = afford the system resources). Is there a SPARQL protocol driver for RAP? If so, the database can be = any RDF system you want, and the RAP application can issue requests over = the SPARQL protocol. >=20 > > We are working on useful update and > > caching strategies to reduce access to the RDF store, but a rather > > high number of parallel requests still is to be expected (though > > normal reading of articles will not touch the store). It would also = be > > possible to restrict to certain types of queries if this leads to > > improved performance.=20 > >=20 > > We currently use RAP as an RDF parser for importing ontologies into > > Semantic MediaWiki. For querying our RDF data, we consider reusing = an > > existing triplestores such as Redland or RAP, but also using SQL > > queries directly. > > Java toolkits are not an option since Wikipedia requires the use of > > free software (and free Java implementations probably don't support > > current RDF stores). Jena runs with IKVM and GNUClasspath. Also runs on .Net and Mono via = IKVM. My experiences with the current IKVM have been very good and they would = suggest that most RDF/Java toolkits will run quite adequately these = days. It wasn't true awhile ago but things have moved on rapidly = recently. Hope that helps, Andy >=20 > If current RDF stores means Named Graph stores then you could use a > combination of Jena and NG4J. Jena is BSD and supports SPARQL. NG4J > adds a API for manipulating Named Graph sets. See: =20 > http://www.wiwiss.fu-berlin.de/suhl/bizer/ng4j/ >=20 > >=20 > > I can imagine that one can already find performance measures for RAP > > somewhere on the web -- sorry if I missed this. >=20 > Not that I know. But all efforts into that direction are highly > welcomed.=20 >=20 > Cheers >=20 > Chris >=20 >=20 > > Best regards, > >=20 > > Markus > >=20 > > -- > > Markus Kr=F6tzsch > > Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe > > ma...@ai... phone +49 (0)721 608 7362 > > www.aifb.uni-karlsruhe.de/WBS/ fax +49 (0)721 693 717 >=20 >=20 > -- > Chris Bizer > Freie Universit=E4t Berlin > Phone: +49 30 838 54057 > Mail: ch...@bi... > Web: www.bizer.de >=20 >=20 >=20 > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language=20 > that extends applications into web and mobile media. Attend the live > webcast=20 > and join the prime developer group breaking into this new coding > territory!=20 > http://sel.as-us.falkag.net/sel?cmd=3Dk&kid=110944&bid$1720&dat=121642 > _______________________________________________ > Rdfapi-php-interest mailing list > Rdf...@li... > https://lists.sourceforge.net/lists/listinfo/rdfapi-php-interest |