From: Y. D. <sys...@ya...> - 2012-11-28 17:03:45
|
Hi Hugh, After many trials and errors, I find that when loading the 15 ttl files, the loading is always stuck at a particular file called yagoTypes.ttl, which only contains rdf:type triples. This file is around 800 MB, other files with sizes over 2GB are loaded successfully so the size shouldn't be a problem. What I don't understand is that isn't loading just dumping triples from ttl files to virtuoso database, which should be really straightforward? It seems that Virtuoso is trying to be clever about the actual contents in a ttl file, it's just really puzzling to me why it needs to stuck at file with particular contents. It never happened on other sql based database I used before. That being said, I think it is hardly a memory issue because of database size, i use the same setting to host another 500GB database without much issue. Anyway, i posted output from status() below. The first one is just beginning the loading, the second one is about 10 mins before Virtuoso stops responding. OpenLink Virtuoso Server Version 06.01.3127-pthreads for Linux as of Sep 14 2012 Started on: 2012/11/28 14:07 GMT+00 Database Status: File size 7698644992, 939776 pages, 361393 free. 680000 buffers, 349688 used, 0 dirty 0 wired down, repl age 0 0 w. io 0 w/crsr. Disk Usage: 312538 reads avg 0 msec, 0% r 0% w last 0 s, 257747 writes, 2092 read ahead, batch = 140. Autocompact 202995 in 137881 out, 32% saved. Gate: 416 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap. Log = /home/ubuntu/LARGEDATA/yago2sOnt/virtuoso.trx, 155 bytes 396346 pages have been changed since last backup (in checkpoint state) Current backup timestamp: 0x0000-0x00-0x00 Last backup date: unknown Clients: 1 connects, max 1 concurrent RPC: 4 calls, 1 pending, 1 max until now, 0 queued, 0 burst reads (0%), 0 second brk=5948985344 Checkpoint Remap 181811 pages, 0 mapped back. 43 s atomic time. DB master 939776 total 361393 free 181811 remap 0 mapped back temp 256 total 251 free Lock Status: 0 deadlocks of which 0 2r1w, 0 waits, Currently 1 threads running 0 threads waiting 0 threads in vdb. Pending: Client 1111:1: Account: dba, 203 bytes in, 256 bytes out, 1 stmts. PID: 1081, OS: unix, Application: unknown, IP#: 127.0.0.1 Transaction status: PENDING, 1 threads. Locks: Running Statements: Time (msec) Text 176 status() Hash indexes OpenLink Virtuoso Server Version 06.01.3127-pthreads for Linux as of Sep 14 2012 Started on: 2012/11/28 14:07 GMT+00 Database Status: File size 11312037888, 1380864 pages, 515539 free. 680000 buffers, 510574 used, 7906 dirty 0 wired down, repl age 3248261 0 w. io 0 w/crsr. Disk Usage: 315439 reads avg 0 msec, 0% r 0% w last 6 s, 705064 writes, 2110 read ahead, batch = 139. Autocompact 597047 in 412146 out, 30% saved. Gate: 419 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap. Log = /home/ubuntu/LARGEDATA/yago2sOnt/virtuoso.trx, 499575222 bytes 546254 pages have been changed since last backup (in checkpoint state) Current backup timestamp: 0x0000-0x00-0x00 Last backup date: unknown Clients: 14 connects, max 2 concurrent RPC: 140 calls, -9 pending, 1 max until now, 0 queued, 38 burst reads (27%), 0 second brk=5974286336 Checkpoint Remap 148694 pages, 0 mapped back. 63 s atomic time. DB master 1380864 total 515539 free 148694 remap 8326 mapped back temp 256 total 251 free Lock Status: 0 deadlocks of which 0 2r1w, 0 waits, Currently 2 threads running 0 threads waiting 0 threads in vdb. Pending: Client 1111:3: Account: dba, 358 bytes in, 352 bytes out, 1 stmts. PID: 1090, OS: unix, Application: unknown, IP#: 127.0.0.1 Transaction status: PENDING, 1 threads. Locks: Client 1111:14: Account: dba, 320 bytes in, 2812 bytes out, 1 stmts. PID: 2009, OS: unix, Application: unknown, IP#: 127.0.0.1 Transaction status: PENDING, 1 threads. Locks: Running Statements: Time (msec) Text 208 status() 938348 rdf_loader_run(log_enable=>3) Hash indexes On 27 Nov 2012, at 21:24, Hugh Williams wrote: > Hi Dong, > > Ah,ok then run status() while the load is in progress then up to the point it hangs and provide the output of the first and last one. > > The rdf_loader_run() sets log_enable(2) in the background so you should not have to run this yourself, which is why I still think the server is running out of buffers during this load .. > > Best Regards > Hugh Williams > Professional Services > OpenLink Software, Inc. // http://www.openlinksw.com/ > Weblog -- http://www.openlinksw.com/blogs/ > LinkedIn -- http://www.linkedin.com/company/openlink-software/ > Twitter -- http://twitter.com/OpenLink > Google+ -- http://plus.google.com/100570109519069333827/ > Facebook -- http://www.facebook.com/OpenLinkSoftware > Universal Data Access, Integration, and Management Technology Providers > > On 27 Nov 2012, at 18:40, Y. Dong wrote: > >> Hi Hugh, >> >> As I've mentioned in my first post, the server stalls to the point that it can not execute any command, including "status()". so I have no measures to know what is happening with the server. But I can confirm the buffer settings are in place. Virtuoso server version is Version 6.1.6.3127-pthreads as of Sep 14 2012. >> >> Just curious to know, according to doc about log_enable (http://docs.openlinksw.com/virtuoso/fn_log_enable.html), the difference between log_enable(2) and log_enable(3) is without/with regular transaction logging. So how log_enable(3), which is with logging, is faster than without logging? >> >> Thanks. >> >> >> On 27 Nov 2012, at 18:22, Hugh Williams wrote: >> >>> Hi Dong, >>> >>> Also, run the “status(‘’);” command run from isql-vt to see the status of the machine at this point, which will confirm if the buffers set in the INI have taken effect and if any locks/deadlocks etc might be occurring. The output of the top command to see the overall memory usage on the system would be useful to see. Finally what is the actual full Virtuoso version being used, which can be obtained by typing the command: >>> >>> virtuoso-t -? >>> >>> Best Regards >>> Hugh Williams >>> Professional Services >>> OpenLink Software, Inc. // http://www.openlinksw.com/ >>> Weblog -- http://www.openlinksw.com/blogs/ >>> LinkedIn -- http://www.linkedin.com/company/openlink-software/ >>> Twitter -- http://twitter.com/OpenLink >>> Google+ -- http://plus.google.com/100570109519069333827/ >>> Facebook -- http://www.facebook.com/OpenLinkSoftware >>> Universal Data Access, Integration, and Management Technology Providers >>> >>> On 27 Nov 2012, at 17:55, Thomas Michaux wrote: >>> >>>> same trick for deleting graph : http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtTipsAndTricksGuideDeleteLargeGraphs >>>> >>>> Such large graphs can be cleared by changing the transaction log mode to autocommit on each operation, deleting the graph(s) or triples, and then setting the log mode back to its original setting. This is easily done using the Virtuoso log_enable function, with the settings log_enable(3,1). >>>> >>>> This function may be called on its own, prior to the delete operation, via iSQL (either command-line or the Conductor variant), as shown: >>>> >>>> log_enable(3,1); >>>> SPARQL CLEAR GRAPH <graph-name>; >>>> >>>> >>>> >>>> Le 27/11/2012 18:52, Thomas Michaux a écrit : >>>>> Hi, >>>>> >>>>> Maybe it's related to the logging system stacking info in RAM, take a look at this : >>>>> >>>>> "One of the side effects of the default log_enable = 2 setting is that triggers are not enabled to speed the loading of data. If triggers are required for RDF Graph replication between nodes etc. then the log_enable mode should be set to 3 when calling the rdf_loader_run() function as follows: >>>>> rdf_loader_run (log_enable=>3); >>>>> >>>>> " >>>>> in http://www.openlinksw.com/dataspace/dav/wiki/Main/VirtBulkRDFLoader >>>>> >>>>> Le 27/11/2012 18:16, Y. Dong a écrit : >>>>>> Hi there, >>>>>> >>>>>> >>>>>> Currently I am trying to load yago2s into the virtuoso database. There are 15 ttl(turtle) files, totalling 8.5GB in size. The server is running on an amazon instance with 64bit Ubuntu 12.04.01 LTS, 8GB Memory, and more than enough disk space. Virtuoso Server Version is 06.01.3127. I use ld_dir() and rdf_loader_run() to load those files. virtuoso.ini is set to utilise the 8GB memory. >>>>>> >>>>>> At first, the loading process ran smoothly, but after half way, the virtuoso server stops responding. I can see 100% CPU usage by virtuoso and I can enter isql-v to start the virtuoso command line. But the server doesn't respond to any command I type in isql-v, it just stalls and I have to use ctrl+c to exit. Now I can't even use status() to see basic stats of the server. The http virtuoso server also stops responding. So I believe the whole virtuoso process stalls somehow. There isn't anything unusual in the log file virtuoso.log (attached below). >>>>>> >>>>>> Has anyone encountered similar problems before? >>>>>> >>>>>> Tue Nov 27 2012 >>>>>> 15:20:34 INFO: { Loading plugin 1: Type `plain', file `wikiv' in `/usr/local//lib/virtuoso/hosting' >>>>>> 15:20:34 INFO: WikiV version 0.6 from OpenLink Software >>>>>> 15:20:34 INFO: Support functions for WikiV collaboration tool >>>>>> 15:20:34 INFO: SUCCESS plugin 1: loaded from /usr/local//lib/virtuoso/hosting/wikiv.so } >>>>>> 15:20:34 INFO: { Loading plugin 2: Type `plain', file `mediawiki' in `/usr/local//lib/virtuoso/hosting' >>>>>> 15:20:34 INFO: MediaWiki version 0.1 from OpenLink Software >>>>>> 15:20:34 INFO: Support functions for MediaWiki collaboration tool >>>>>> 15:20:34 INFO: SUCCESS plugin 2: loaded from /usr/local//lib/virtuoso/hosting/mediawiki.so } >>>>>> 15:20:34 INFO: { Loading plugin 3: Type `plain', file `creolewiki' in `/usr/local//lib/virtuoso/hosting' >>>>>> 15:20:34 INFO: CreoleWiki version 0.1 from OpenLink Software >>>>>> 15:20:34 INFO: Support functions for CreoleWiki collaboration tool >>>>>> 15:20:34 INFO: SUCCESS plugin 3: loaded from /usr/local//lib/virtuoso/hosting/creolewiki.so } >>>>>> 15:20:34 INFO: { Loading plugin 4: Type `plain', file `im' in `/usr/local//lib/virtuoso/hosting' >>>>>> 15:20:35 INFO: IM version 0.6 from OpenLink Software >>>>>> 15:20:35 INFO: Support functions for Image Magick 6.6.9 >>>>>> 15:20:35 INFO: SUCCESS plugin 4: loaded from /usr/local//lib/virtuoso/hosting/im.so } >>>>>> 15:20:35 INFO: OpenLink Virtuoso Universal Server >>>>>> 15:20:35 INFO: Version 06.01.3127-pthreads for Linux as of Sep 14 2012 >>>>>> 15:20:35 INFO: uses parts of OpenSSL, PCRE, Html Tidy >>>>>> 15:20:41 INFO: Database version 3126 >>>>>> 15:20:41 INFO: SQL Optimizer enabled (max 1000 layouts) >>>>>> 15:20:43 INFO: Compiler unit is timed at 0.001027 msec >>>>>> 15:21:03 DEBUG: built-in procedure "repl_undot_name" overruled by the RDBMS >>>>>> 15:21:03 DEBUG: built-in procedure "REPL_FQNAME" overruled by the RDBMS >>>>>> 15:21:03 DEBUG: built-in procedure "REPL_COLTYPE_PS" overruled by the RDBMS >>>>>> 15:21:03 DEBUG: built-in procedure "REPL_COLTYPE" overruled by the RDBMS >>>>>> 15:21:04 INFO: Roll forward started >>>>>> 15:21:42 INFO: 1000 transactions, 7274496 bytes replayed (13 %) >>>>>> 15:22:13 INFO: 2000 transactions, 14548992 bytes replayed (27 %) >>>>>> 15:22:42 INFO: 3000 transactions, 21856256 bytes replayed (40 %) >>>>>> 15:23:11 INFO: 4000 transactions, 29163520 bytes replayed (54 %) >>>>>> 15:23:38 INFO: 5000 transactions, 36405248 bytes replayed (68 %) >>>>>> 15:23:54 INFO: 6000 transactions, 42500096 bytes replayed (79 %) >>>>>> 15:24:00 INFO: 7000 transactions, 47677440 bytes replayed (89 %) >>>>>> 15:24:06 INFO: 8000 transactions, 52396032 bytes replayed (97 %) >>>>>> 15:24:08 INFO: 8250 transactions, 53526661 bytes replayed (100 %) >>>>>> 15:24:08 INFO: Roll forward complete >>>>>> 15:24:12 INFO: Checkpoint started >>>>>> 15:24:22 INFO: Checkpoint finished, log reused >>>>>> 15:24:22 INFO: HTTP/WebDAV server online at 8890 >>>>>> 15:24:22 INFO: Server online at 1111 (pid 28660) >>>>>> 15:28:59 INFO: PL LOG: Loader started >>>>>> 15:29:20 INFO: PL LOG: Loader started >>>>>> 15:30:16 INFO: PL LOG: Loader started >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> Monitor your physical, virtual and cloud infrastructure from a single >>>>>> web console. Get in-depth insight into apps, servers, databases, vmware, >>>>>> SAP, cloud infrastructure, etc. Download 30-day Free Trial. >>>>>> Pricing starts from $795 for 25 servers or applications! >>>>>> http://p.sf.net/sfu/zoho_dev2dev_nov >>>>>> _______________________________________________ >>>>>> Virtuoso-users mailing list >>>>>> Vir...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users >>>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Monitor your physical, virtual and cloud infrastructure from a single >>>> web console. Get in-depth insight into apps, servers, databases, vmware, >>>> SAP, cloud infrastructure, etc. Download 30-day Free Trial. >>>> Pricing starts from $795 for 25 servers or applications! >>>> http://p.sf.net/sfu/zoho_dev2dev_nov_______________________________________________ >>>> Virtuoso-users mailing list >>>> Vir...@li... >>>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users >>> >> > |