From: Markus <ma...@ai...> - 2006-04-12 18:09:54
|
Hi Chris, thanks for the quick answer. I take from your words that RAP might not yet = be=20 quick enough *in general*. On the other hand, no single tool really meets a= ll=20 our needs (especially since the Java-stores are out), and RAP at least=20 appears to be well-maintained and evolving. Also I like that RAP can be=20 configured for various settings (e.g. with various levels of inferencing), = so=20 we could allow people to switch on complex features if they have smaller=20 wikis. My inquiry was rather general. We have a lot of data, but we do not need al= l=20 of these functions to be very fast. What we really do is: =3D=3D Standard wiki usage =3D=3D * On normal article *views* (by far the most common operation), at most som= e=20 simple reads are needed (if the article is not in cache and certain=20 annotations are used). Same is true for *previews* during editing. * On every article *write*, the store has to be updated (delete + write). T= his=20 could be optimized by checking for actual changes in the RDF. =3D=3D Semantic features =3D=3D * Further simple reads occur for exporting RDF. This could be optimized by= =20 caching. * Complex queries shall be supported in a simplified inline syntax: users a= dd=20 queries to the article source, and the article then shows the result lists.= =20 These lists need to be updated regularly, but not on every change. So, if i= t=20 is not affordable to do live-updates for the query results, updating result= =20 lists included in articles once a day might also be acceptable. This is qui= te=20 an extreme case (and might not be motivating for contributors who want to s= ee=20 their changes have immediate effect), but it illustrates that we are somewh= at=20 flexible. What we really need is to guarantee that the standard usage is hardly slowe= d=20 down at all. The added semantic features are somewhat optional: we need a=20 certain amount to convince anyone to use the extension, but we can be=20 restrictive to ensure acceptable performance. It would also be OK to restri= ct=20 queries wrt. complexity or size of result set. Our problem with evaluation = is=20 that we do not have real testing data until the extension is active in some= =20 major wiki, but that we need to ensure some amount of scalability before=20 that.=20 I would also like to learn more about the current capabilities of Appmosphe= re.=20 My impression was that its RDF-store and query features are rather new -- i= s=20 it currently recommended for major productive use? Having an integrated API= =20 of RAP and Appmosphere would clearly be great for our setting. Redland is the third store that we really consider. Since it seems to be a= =20 one-man-project, I wonder whether its future development is secured (e.g. t= he=20 demos on the site where all disabled when Dave Beckett switched to Yahoo!) Concerning 3Store, I thought that they have a document-centeric approach wh= ere=20 you first load a large RDF document and then ask queries. Whatever the=20 performance of the querying is, we could not afford to reload the whole dat= a=20 everytime someone makes a change. The PHP-binding of 3Store is realized by= =20 making calls to shell-commands from PHP. On Wednesday 12 April 2006 17:50, Chris Bizer wrote: > Hi Markus, > > > We consider using RAP as a quadstore for Semantic MediaWiki (see > > http://wiki.ontoworld.org). > > Interesting. > > > In the long run, we are interested in > > inferencing, but for now Wikipedia-size scalability is most important. > > Hmm sorry, up to my knowledge there are no systematic comparisons of the > performance of RAP with other RDF toolkits. > > We did some relatively unsystematic performance testing when we implement= ed > different features, but the results are outdated by now. > > S=F6ren Auer and Bart Pieterse (both cc'ed) have used RAP in bigger proje= cts > and I guess they are the best sources for practical experiences with the > performance of RAP with bigger real world datasets. > > My general impression is that as PHP itself is still slower than languages > like Java or C, RAP is also slow and its performance can not be compared > with toolkits like Jena or Sesame. S=F6ren might disagree on this point w= ith > me. > > > Are > > there recent evaluations concerning the performance of the different > > storage > > models? In particular, we are interested in scalability of the following > > functions: > > > > 1 SPARQL queries: > > 1.1 general performance > > Around one second for a medium complex query against a data set with 100 > 000 triples in memory, much slower if the data set is in a database. Tobi= as > Gauss can give you details. > > An PHP alternative for SPARQL queries against data sets which are stored = in > a database is Benjamin appmosphere toolkit > http://www.appmosphere.com/pages/en-arc. He does smarter SPARQL to SQL > rewriting than RAP and should theoretically be faster. > > > 1.2 performance of "join-intensive" queries (involving long chains of > > triples) > > 1.3 performance of datatype queries (e.g. selecting/sorting results by > > some > > xsd:int or xsd:decimal) > > 1.4 performance for partial result lists (e.g. getting only the first > > 20) 2 simple read access (e.g. getting all triples of a certain pattern > > or RDF dataset) > > OK with models up to 100 000 triples. Don't know about bigger models. > S=F6ren? > > > 3 write access > > 3.1 adding triples to an existing store > > 3.2 deleting selected triples from the store > > Should be OK. I think S=F6ren implemented some work arounds for bulk upda= tes. > > > 4 impact of RDF dataset features/named graph functionality > > About 5% slower than operations on classic RDF models. > > > For inclusion in Wikipedia, dealing with about 10 Mio triples split into > > 1 Mio > > RDF datasets is probably necessary. > > Too much for RAP, too much for appmoshere (Benjamin?), and I guess even > hard for Jena, Redland and Co if the queries become more complicated. > > > We are working on useful update and > > caching strategies to reduce access to the RDF store, but a rather high > > number of parallel requests still is to be expected (though normal > > reading of > > articles will not touch the store). It would also be possible to restri= ct > > to > > certain types of queries if this leads to improved performance. > > > > We currently use RAP as an RDF parser for importing ontologies into > > Semantic > > MediaWiki. For querying our RDF data, we consider reusing an existing > > triplestores such as Redland or RAP, but also using SQL queries directl= y. > > Java toolkits are not an option since Wikipedia requires the use of free > > software (and free Java implementations probably don't support current > > RDF stores). > > If current RDF stores means Named Graph stores then you could use a > combination of Jena and NG4J. Jena is BSD and supports SPARQL. NG4J adds a > API for manipulating Named Graph sets. See: > http://www.wiwiss.fu-berlin.de/suhl/bizer/ng4j/ > > > I can imagine that one can already find performance measures for RAP > > somewhere > > on the web -- sorry if I missed this. > > Not that I know. But all efforts into that direction are highly welcomed. > > Cheers > > Chris > > > Best regards, > > > > Markus > > > > -- > > Markus Kr=F6tzsch > > Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe > > ma...@ai... phone +49 (0)721 608 7362 > > www.aifb.uni-karlsruhe.de/WBS/ fax +49 (0)721 693 717 =2D-=20 Markus Kr=F6tzsch Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe ma...@ai... phone +49 (0)721 608 7362 www.aifb.uni-karlsruhe.de/WBS/ fax +49 (0)721 693 717 |
From: Seaborne, A. <and...@hp...> - 2006-04-13 13:04:31
|
-------- Original Message -------- > From: Chris Bizer <> > Date: 12 April 2006 16:50 >=20 > Hi Markus, >=20 > >=20 > > We consider using RAP as a quadstore for Semantic MediaWiki (see > > http://wiki.ontoworld.org). >=20 > Interesting. >=20 > > In the long run, we are interested in > > inferencing, but for now Wikipedia-size scalability is most = important. >=20 > Hmm sorry, up to my knowledge there are no systematic comparisons of > the performance of RAP with other RDF toolkits.=20 >=20 > We did some relatively unsystematic performance testing when we > implemented different features, but the results are outdated by now.=20 >=20 > S=F6ren Auer and Bart Pieterse (both cc'ed) have used RAP in bigger > projects and I guess they are the best sources for practical > experiences with the performance of RAP with bigger real world > datasets. =20 >=20 > My general impression is that as PHP itself is still slower than > languages like Java or C, RAP is also slow and its performance can not > be compared with toolkits like Jena or Sesame. S=F6ren might disagree = on > this point with me. =20 >=20 > > Are > > there recent evaluations concerning the performance of the different > > storage models? In particular, we are interested in scalability of > > the following functions: > >=20 > > 1 SPARQL queries: > > 1.1 general performance >=20 > Around one second for a medium complex query against a data set with > 100 000 triples in memory, much slower if the data set is in a > database. Tobias Gauss can give you details. =20 >=20 > An PHP alternative for SPARQL queries against data sets which are > stored in a database is Benjamin appmosphere toolkit > http://www.appmosphere.com/pages/en-arc. He does smarter SPARQL to = SQL > rewriting than RAP and should theoretically be faster. =20 >=20 > > 1.2 performance of "join-intensive" queries (involving long chains > > of triples)=20 > > 1.3 performance of datatype queries (e.g. selecting/sorting results > > by some > > xsd:int or xsd:decimal) > > 1.4 performance for partial result lists (e.g. getting only the > > first 20) 2 simple read access (e.g. getting all triples of a = certain > > pattern or RDF dataset) >=20 > OK with models up to 100 000 triples. Don't know about bigger models. > S=F6ren?=20 >=20 > > 3 write access > > 3.1 adding triples to an existing store > > 3.2 deleting selected triples from the store >=20 > Should be OK. I think S=F6ren implemented some work arounds for bulk > updates.=20 >=20 > > 4 impact of RDF dataset features/named graph functionality >=20 > About 5% slower than operations on classic RDF models. >=20 > > For inclusion in Wikipedia, dealing with about 10 Mio triples split > > into 1 Mio RDF datasets is probably necessary. >=20 > Too much for RAP, too much for appmoshere (Benjamin?), and I guess = even > hard for Jena, Redland and Co if the queries become more complicated.=20 That would be surprising. Although it depends on the query greatly, = graphs upto 100e6 should be no problem. They take a while to load = though :-) http://esw.w3.org/topic/LargeTripleStores http://esw.w3.org/topic/LargeQuadStores We're using a 10e6 triple graph for regular testing at the moment with a = new database store for Jena and we use the same data with the existing = Jena database solution. (The choice of 10e6 is based on big enough to = show effects of scale but small enough to be manageable as we keep = reloading due to schema experimentation.) Real testing is on 100e6 = triples. Support for interactive use is trickier especially if the queries are = arbitrary as its possible to write queries that are always going to have = large intermediate results. If the app does not allow the user to = (indirectly) write an arbirary query, then a little care with queries = should make interactive use possible on data up to 100e6 and beyond. = Steve Harris (3Store) has a lot of experience with this. At 1e6 triples, it is more a matter of running in memory (if you can = afford the system resources). Is there a SPARQL protocol driver for RAP? If so, the database can be = any RDF system you want, and the RAP application can issue requests over = the SPARQL protocol. >=20 > > We are working on useful update and > > caching strategies to reduce access to the RDF store, but a rather > > high number of parallel requests still is to be expected (though > > normal reading of articles will not touch the store). It would also = be > > possible to restrict to certain types of queries if this leads to > > improved performance.=20 > >=20 > > We currently use RAP as an RDF parser for importing ontologies into > > Semantic MediaWiki. For querying our RDF data, we consider reusing = an > > existing triplestores such as Redland or RAP, but also using SQL > > queries directly. > > Java toolkits are not an option since Wikipedia requires the use of > > free software (and free Java implementations probably don't support > > current RDF stores). Jena runs with IKVM and GNUClasspath. Also runs on .Net and Mono via = IKVM. My experiences with the current IKVM have been very good and they would = suggest that most RDF/Java toolkits will run quite adequately these = days. It wasn't true awhile ago but things have moved on rapidly = recently. Hope that helps, Andy >=20 > If current RDF stores means Named Graph stores then you could use a > combination of Jena and NG4J. Jena is BSD and supports SPARQL. NG4J > adds a API for manipulating Named Graph sets. See: =20 > http://www.wiwiss.fu-berlin.de/suhl/bizer/ng4j/ >=20 > >=20 > > I can imagine that one can already find performance measures for RAP > > somewhere on the web -- sorry if I missed this. >=20 > Not that I know. But all efforts into that direction are highly > welcomed.=20 >=20 > Cheers >=20 > Chris >=20 >=20 > > Best regards, > >=20 > > Markus > >=20 > > -- > > Markus Kr=F6tzsch > > Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe > > ma...@ai... phone +49 (0)721 608 7362 > > www.aifb.uni-karlsruhe.de/WBS/ fax +49 (0)721 693 717 >=20 >=20 > -- > Chris Bizer > Freie Universit=E4t Berlin > Phone: +49 30 838 54057 > Mail: ch...@bi... > Web: www.bizer.de >=20 >=20 >=20 > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language=20 > that extends applications into web and mobile media. Attend the live > webcast=20 > and join the prime developer group breaking into this new coding > territory!=20 > http://sel.as-us.falkag.net/sel?cmd=3Dk&kid=110944&bid$1720&dat=121642 > _______________________________________________ > Rdfapi-php-interest mailing list > Rdf...@li... > https://lists.sourceforge.net/lists/listinfo/rdfapi-php-interest |
From: Markus <ma...@ai...> - 2006-04-12 18:13:07
|
On Wednesday 12 April 2006 20:08, Markus Kr=F6tzsch wrote: > Hi Chris, > > thanks for the quick answer. I take from your words that RAP might not yet > be quick enough *in general*. On the other hand, no single tool really > meets all our needs (especially since the Java-stores are out), and RAP at > least appears to be well-maintained and evolving. Also I like that RAP can > be configured for various settings (e.g. with various levels of > inferencing), so we could allow people to switch on complex features if > they have smaller wikis. > > My inquiry was rather general. We have a lot of data, but we do not need > all of these functions to be very fast. What we really do is: > > =3D=3D Standard wiki usage =3D=3D > > * On normal article *views* (by far the most common operation), at most > some simple reads are needed (if the article is not in cache and certain > annotations are used). Same is true for *previews* during editing. * On > every article *write*, the store has to be updated (delete + write). This > could be optimized by checking for actual changes in the RDF. > > =3D=3D Semantic features =3D=3D > > * Further simple reads occur for exporting RDF. This could be optimized by > caching. > * Complex queries shall be supported in a simplified inline syntax: users > add queries to the article source, and the article then shows the result > lists. These lists need to be updated regularly, but not on every change. > So, if it is not affordable to do live-updates for the query results, > updating result lists included in articles once a day might also be > acceptable. This is quite an extreme case (and might not be motivating for > contributors who want to see their changes have immediate effect), but it > illustrates that we are somewhat flexible. > > What we really need is to guarantee that the standard usage is hardly > slowed down at all. The added semantic features are somewhat optional: we > need a certain amount to convince anyone to use the extension, but we can > be restrictive to ensure acceptable performance. It would also be OK to > restrict queries wrt. complexity or size of result set. Our problem with > evaluation is that we do not have real testing data until the extension is > active in some major wiki, but that we need to ensure some amount of > scalability before that. > > > I would also like to learn more about the current capabilities of > Appmosphere. My impression was that its RDF-store and query features are > rather new -- is it currently recommended for major productive use? Having > an integrated API of RAP and Appmosphere would clearly be great for our > setting. > > Redland is the third store that we really consider. Since it seems to be a > one-man-project, I wonder whether its future development is secured (e.g. > the demos on the site where all disabled when Dave Beckett switched to > Yahoo!) > > Concerning 3Store, I thought that they have a document-centeric approach > where you first load a large RDF document and then ask queries. Whatever > the performance of the querying is, we could not afford to reload the who= le > data everytime someone makes a change. The PHP-binding of 3Store is > realized by making calls to shell-commands from PHP. Ups -- accidently pressed a key and the email was gone :-) ... Anyway, that= =20 was basically it. I am still eager to hear about more practical experiences= =20 with RAP, Appmosphere, and Redland. Regards, Markus > > On Wednesday 12 April 2006 17:50, Chris Bizer wrote: > > Hi Markus, > > > > > We consider using RAP as a quadstore for Semantic MediaWiki (see > > > http://wiki.ontoworld.org). > > > > Interesting. > > > > > In the long run, we are interested in > > > inferencing, but for now Wikipedia-size scalability is most important. > > > > Hmm sorry, up to my knowledge there are no systematic comparisons of the > > performance of RAP with other RDF toolkits. > > > > We did some relatively unsystematic performance testing when we > > implemented different features, but the results are outdated by now. > > > > S=F6ren Auer and Bart Pieterse (both cc'ed) have used RAP in bigger > > projects and I guess they are the best sources for practical experiences > > with the performance of RAP with bigger real world datasets. > > > > My general impression is that as PHP itself is still slower than > > languages like Java or C, RAP is also slow and its performance can not = be > > compared with toolkits like Jena or Sesame. S=F6ren might disagree on t= his > > point with me. > > > > > Are > > > there recent evaluations concerning the performance of the different > > > storage > > > models? In particular, we are interested in scalability of the > > > following functions: > > > > > > 1 SPARQL queries: > > > 1.1 general performance > > > > Around one second for a medium complex query against a data set with 100 > > 000 triples in memory, much slower if the data set is in a database. > > Tobias Gauss can give you details. > > > > An PHP alternative for SPARQL queries against data sets which are stored > > in a database is Benjamin appmosphere toolkit > > http://www.appmosphere.com/pages/en-arc. He does smarter SPARQL to SQL > > rewriting than RAP and should theoretically be faster. > > > > > 1.2 performance of "join-intensive" queries (involving long chains of > > > triples) > > > 1.3 performance of datatype queries (e.g. selecting/sorting results = by > > > some > > > xsd:int or xsd:decimal) > > > 1.4 performance for partial result lists (e.g. getting only the first > > > 20) 2 simple read access (e.g. getting all triples of a certain patte= rn > > > or RDF dataset) > > > > OK with models up to 100 000 triples. Don't know about bigger models. > > S=F6ren? > > > > > 3 write access > > > 3.1 adding triples to an existing store > > > 3.2 deleting selected triples from the store > > > > Should be OK. I think S=F6ren implemented some work arounds for bulk > > updates. > > > > > 4 impact of RDF dataset features/named graph functionality > > > > About 5% slower than operations on classic RDF models. > > > > > For inclusion in Wikipedia, dealing with about 10 Mio triples split > > > into 1 Mio > > > RDF datasets is probably necessary. > > > > Too much for RAP, too much for appmoshere (Benjamin?), and I guess even > > hard for Jena, Redland and Co if the queries become more complicated. > > > > > We are working on useful update and > > > caching strategies to reduce access to the RDF store, but a rather hi= gh > > > number of parallel requests still is to be expected (though normal > > > reading of > > > articles will not touch the store). It would also be possible to > > > restrict to > > > certain types of queries if this leads to improved performance. > > > > > > We currently use RAP as an RDF parser for importing ontologies into > > > Semantic > > > MediaWiki. For querying our RDF data, we consider reusing an existing > > > triplestores such as Redland or RAP, but also using SQL queries > > > directly. Java toolkits are not an option since Wikipedia requires the > > > use of free software (and free Java implementations probably don't > > > support current RDF stores). > > > > If current RDF stores means Named Graph stores then you could use a > > combination of Jena and NG4J. Jena is BSD and supports SPARQL. NG4J adds > > a API for manipulating Named Graph sets. See: > > http://www.wiwiss.fu-berlin.de/suhl/bizer/ng4j/ > > > > > I can imagine that one can already find performance measures for RAP > > > somewhere > > > on the web -- sorry if I missed this. > > > > Not that I know. But all efforts into that direction are highly welcome= d. > > > > Cheers > > > > Chris > > > > > Best regards, > > > > > > Markus > > > > > > -- > > > Markus Kr=F6tzsch > > > Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe > > > ma...@ai... phone +49 (0)721 608 7362 > > > www.aifb.uni-karlsruhe.de/WBS/ fax +49 (0)721 693 717 =2D-=20 Markus Kr=F6tzsch Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe ma...@ai... phone +49 (0)721 608 7362 www.aifb.uni-karlsruhe.de/WBS/ fax +49 (0)721 693 717 |