From: Gunnar A. G. <ggr...@cs...> - 2005-02-01 14:33:28
Attachments:
RAPDBInsert.java
|
Hi all, Good idea to setup a mailing-list chris! This might be a bit early, as there might not be anyone else than me and chris subscribed yet, but others can always check the archive. Here the other day I needed to put a 120mb rdf/xml file into a RAP DB, I tried using the db-tool and uploading it, but on my P4 with 1gb of memory it used 1.8gb of memory, froze the whole machine and was still showing no sign of connecting to the DB after 30 minutes, so I stopped it. Instead admitted that maybe PHP isn't right for everything and wrote a java program to insert into mysql DB in RAPDB format. The program is attached, and currently requires jena2, Java getopt (http://www.urbanophile.com/arenn/hacking/download.html) and probably java 1.5. With this you can for example do: java RAPDBInsert -m "http://example.org" -h trogon -u ggrimnes -p **** -d rdf ~/public_html/foaf.rdf This worked like a dream, and I've now got a RAP DB with 1579160 (1.5M!) triples. Normal operations like $model->find($res,$DC_title,null) are quick, but anything that requires getting the whole model into memory (f.x. regex finds) breaks everything. Even getting the size of the model takes ~=15s, but with some care it's fully usable! Anyway, hope this might be useful to someone else. - Gunnar -- Gunnar AAstrand Grimnes ----------------------- Room 312, Computing Science Dept. University of Aberdeen Aberdeen AB24 3UE Mobile: (+44) (0) 7950 251379 Email: ggr...@cs... WWW: http://www.csd.abdn.ac.uk/~ggrimnes |
From: Chris B. <bi...@ze...> - 2005-02-01 14:46:42
|
Uppps, 1.5 MTriple in RAP, great :-) I guess S=F6ren will be interested. He was playing around with similar = size N3 files for his POWL project based on RAP. Chris --=20 Chris Bizer Freie Universit=E4t Berlin Phone: +49 30 838 54057 Mail: ch...@bi... Web: www.bizer.de -----Urspr=FCngliche Nachricht----- Von: rdf...@li... [mailto:rdf...@li...] Im Auftrag von Gunnar AAstrand Grimnes Gesendet: Dienstag, 1. Februar 2005 15:33 An: rdf...@li... Betreff: [Rdfapi-php-interest] RAP DB Hi all, Good idea to setup a mailing-list chris! This might be a bit early, as there might not be anyone else than me and = chris subscribed yet, but others can always check the archive. Here the other day I needed to put a 120mb rdf/xml file into a RAP DB, I = tried using the db-tool and uploading it, but on my P4 with 1gb of=20 memory it used 1.8gb of memory, froze the whole machine and was still=20 showing no sign of connecting to the DB after 30 minutes, so I stopped = it. Instead admitted that maybe PHP isn't right for everything and wrote a=20 java program to insert into mysql DB in RAPDB format. The program is=20 attached, and currently requires jena2, Java getopt=20 (http://www.urbanophile.com/arenn/hacking/download.html) and probably=20 java 1.5. With this you can for example do: java RAPDBInsert -m "http://example.org" -h trogon -u ggrimnes -p ****=20 -d rdf ~/public_html/foaf.rdf This worked like a dream, and I've now got a RAP DB with 1579160 (1.5M!) = triples. Normal operations like $model->find($res,$DC_title,null) are=20 quick, but anything that requires getting the whole model into memory=20 (f.x. regex finds) breaks everything. Even getting the size of the model = takes ~=3D15s, but with some care it's fully usable! Anyway, hope this might be useful to someone else. - Gunnar --=20 Gunnar AAstrand Grimnes ----------------------- Room 312, Computing Science Dept. University of Aberdeen Aberdeen AB24 3UE Mobile: (+44) (0) 7950 251379 Email: ggr...@cs... WWW: http://www.csd.abdn.ac.uk/~ggrimnes |
From: <au...@in...> - 2005-02-01 15:30:11
|
Hi All, > Here the other day I needed to put a 120mb rdf/xml file into a RAP DB,=20 > I tried using the db-tool and uploading it, but on my P4 with 1gb of=20 > memory it used 1.8gb of memory, froze the whole machine and was still=20 > showing no sign of connecting to the DB after 30 minutes, so I stopped=20 > it. I already loaded quite large models using PHP and RAP, but you have to=20 set the $stream parameter of model::load to true to prevent RAP from=20 creating an in memory model first: model::load($filename, $type =3D NULL, $stream=3Dfalse) Hope that helps S=F6ren |
From: Gunnar A. G. <ggr...@cs...> - 2005-02-01 15:32:15
|
Ah, bummer :) all that java work wasted. How large are we talking? - Gunnar S=F6ren Auer wrote: > Hi All, >=20 >> Here the other day I needed to put a 120mb rdf/xml file into a RAP DB,= =20 >> I tried using the db-tool and uploading it, but on my P4 with 1gb of=20 >> memory it used 1.8gb of memory, froze the whole machine and was still=20 >> showing no sign of connecting to the DB after 30 minutes, so I stopped= =20 >> it. >=20 >=20 > I already loaded quite large models using PHP and RAP, but you have to=20 > set the $stream parameter of model::load to true to prevent RAP from=20 > creating an in memory model first: >=20 > model::load($filename, $type =3D NULL, $stream=3Dfalse) >=20 > Hope that helps >=20 > S=F6ren --=20 Gunnar AAstrand Grimnes ----------------------- Room 312, Computing Science Dept. University of Aberdeen Aberdeen AB24 3UE Mobile: (+44) (0) 7950 251379 Email: ggr...@cs... WWW: http://www.csd.abdn.ac.uk/~ggrimnes |
From: <au...@in...> - 2005-02-01 15:56:23
|
Gunnar AAstrand Grimnes wrote: > Ah, bummer :) all that java work wasted. > > How large are we talking? I loaded UNSPSC and NCI Cancer Ontologies (0.5 Mio Triples), but with=20 $stream set to true it is only a matter of time to load arbitrary large=20 models - up to my measurements Powl/RAP was not significantly slower=20 than Jena. Eventually RAP has to be modified at one more further point -=20 to disable checking for duplicate entries when adding triples: model/DBModel.php 139 if (!empty($this->dontCheckForDuplicatesOnAdd) ||=20 !$this->contains($statement)) { $model->dontCheckForDuplicatesOnAdd than has to be set to true before=20 loading the data. And one further remark - maybe you should use the=20 command line version of PHP then to avoid timeouts :-) Happy data mining S=F6ren |