From: Bryan T. <br...@sy...> - 2016-02-22 20:20:42
|
Do not have log @ INFO for blazegraph. It will kill performance. Put it at WARN. There is a bug in the DataLoaderServlet. If you have to abort a load, make sure that you terminate the blazegraph process since that servlet does not correctly unwind a partial commit. Bryan ---- Bryan Thompson Chief Scientist & Founder Blazegraph e: br...@bl... w: http://blazegraph.com Blazegraph products help to solve the Graph Cache Thrash to achieve large scale processing for graph and predictive analytics. Blazegraph is the creator of the industry’s first GPU-accelerated high-performance database for large graphs, has been named as one of the “10 Companies and Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. Blazegraph Database <https://www.blazegraph.com/> is our ultra-high performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. Blazegraph GPU <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive new technologies that use GPUs to enable extreme scaling that is thousands of times faster and 40 times more affordable than CPU-based solutions. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Mon, Feb 22, 2016 at 3:16 PM, Joakim Soderberg < joa...@bl...> wrote: > Brad, > Thats’s right, in my log i get a steady stream of this: > > -02-22 20:11:11,639) INFO : StatementBuffer.java:1773: term: > http://pl.dbpedia.org/resource/Melbourne_Zoo, iv: null > (2016-02-22 20:11:11,640) INFO : StatementBuffer.java:1773: term: > http://pt.dbpedia.org/resource/Zoológico_de_Melbourne, iv: null > (2016-02-22 20:11:11,640) INFO : StatementBuffer.java:1773: term: > http://ru.dbpedia.org/resource/Мельбурнский_зоопарк, iv: null > (2016-02-22 20:11:11,640) INFO : StatementBuffer.java:1773: term: > http://uk.dbpedia.org/resource/Мельбурнський_зоопарк, iv: null > (2016-02-22 20:11:11,640) INFO : StatementBuffer.java:1773: term: > http://vi.dbpedia.org/resource/Sở_thú_Melbourne, iv: null > (2016-02-22 20:11:11,640) INFO : StatementBuffer.java:1773: term: > http://dbpedia.org/resource/Nova_Air, iv: null > (2016-02-22 20:11:11,640) INFO : StatementBuffer.java:1773: term: > http://wikidata.org/entity/Q578032, iv: null > (2016-02-22 20:11:11,640) INFO : StatementBuffer.java:1773: term: > http://wikidata.dbpedia.org/resource/Q578032, iv: null > (2016-02-22 20:11:11,640) INFO : StatementBuffer.java:1773: term: > http://es.dbpedia.org/resource/Nova_Air, iv: null > (2016-02-22 20:11:11,641) INFO : StatementBuffer.java:1773: term: > http://pl.dbpedia.org/resource/Nova_Air, iv: null > (2016-02-22 20:11:11,641) INFO : StatementBuffer.java:1773: term: > http://dbpedia.org/resource/Milton_Work, iv: null > (2016-02-22 20:11:11,641) INFO : StatementBuffer.java:1773: term: > http://wikidata.org/entity/Q578085, iv: null > (2016-02-22 20:11:11,641) INFO : StatementBuffer.java:1773: term: > http://wikidata.dbpedia.org/resource/Q578085, iv: null > (2016-02-22 20:11:11,641) INFO : StatementBuffer.java:1773: term: > http://fr.dbpedia.org/resource/Milton_Work, iv: null > (2016-02-22 20:11:11,641) INFO : StatementBuffer.java:1773: term: > http://pl.dbpedia.org/resource/Milton_Work, iv: null > (2016-02-22 20:11:11,641) INFO : StatementBuffer.java:1773: term: > http://dbpedia.org/resource/Lisa_Nandy, iv: null > (2016-02-22 20:11:11,641) INFO : StatementBuffer.java:1773: term: > http://wikidata.org/entity/Q578037, iv: null > (2016-02-22 20:11:11,642) INFO : StatementBuffer.java:1773: term: > http://wikidata.dbpedia.org/resource/Q578037, iv: null > > > Is “iv:null” bad? > > I am loading 53 ttl-files of 150G > > /Joakim > > > > On Feb 22, 2016, at 12:06 PM, Brad Bebee <be...@bl...> wrote: > > Joakim, > > You should see log output as the statements are loaded. How much data > are you loading at once? > > Thanks, --Brad > > On Mon, Feb 22, 2016 at 2:59 PM, Joakim Soderberg < > joa...@bl...> wrote: > >> Thanks for the advice. Now it has been indexing for several days and I >> have no idea what it’s doing. >> >> On Feb 22, 2016, at 9:04 AM, Jeremy J Carroll <jj...@sy...> wrote: >> >> Try looking on the status tab of the blazegraph UI in the browser. In the >> detail view of your particular task, there might be a counter showing how >> many triples have been updated. >> >> (I am unsure as to which tasks support this under which versions …) >> >> Jeremy >> >> >> >> On Feb 17, 2016, at 12:26 PM, Brad Bebee <be...@bl...> wrote: >> >> Joakim, >> >> With the DataLoader, the commit is after all of the data is loaded. Once >> the load is complete, all of the statements will be visible. >> >> Thanks, --Brad >> >> On Wed, Feb 17, 2016 at 3:21 PM, Joakim Soderberg < >> joa...@bl...> wrote: >> >>> I am calling: >>> >>> curl -X POST --data-binary @dataloader.xml --header >>> 'Content-Type:application/xml' http:/__.__.__:9999/blazegraph/dataloader >>> >>> I can see the size of the JNL-file is increasing, but when I query >>> number of statements in the dashboard the data doesn’t show up. >>> >>> select (count(*) as ?num) { ?s ?p ?o } >>> >>> Do I need to Flush the StatementBuffer to the backing store after the >>> curl? >>> >>> This is my config file: >>> >>> <?xml version="1.0" encoding="UTF-8" standalone="no"?> >>> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> >>> <properties> >>> <!-- RDF Format (Default is rdf/xml) --> >>> <entry key="format">N-Triples</entry> >>> <!-- Base URI (Optional) --> >>> <entry key="baseURI"></entry> >>> <!-- Default Graph URI (Optional - >>> Required for quads mode namespace) --> >>> <entry >>> key="defaultGraph"></entry> >>> <!-- Suppress all stdout >>> messages (Optional) --> >>> <entry >>> key="quiet">false</entry> >>> <!-- Show >>> additional messages detailing the load performance. (Optional) --> >>> <entry >>> key="verbose">3</entry> >>> >>> <!-- Compute the RDF(S)+ closure. (Optional) --> >>> <entry key="closure">false</entry> >>> <!-- Files will be renamed to either .good or .fail as >>> they are processed. >>> The files will remain in the same directory. --> >>> <entry key="durableQueues">true</entry> >>> <!-- The namespace of the KB instance. >>> Defaults to kb. --> >>> <entry key="namespace">kb</entry> >>> <!-- The configuration file for >>> the database instance. It must be readable by the web application. --> >>> <entry key="propertyFile">RWStore.properties</entry> >>> <!-- Zero or more files or directories containing the >>> data to be loaded. >>> This should be a comma delimited list. The files must >>> be readable by the web application. --> >>> <entry key="fileOrDirs">/mydata/dbpedia2015/core/</entry> >>> </properties> >>> >>> >>> >>> On Feb 16, 2016, at 8:35 AM, Joakim Soderberg < >>> joa...@bl...> wrote: >>> >>> I knew there is a DataLoader class, but I wasn’t aware it was available >>> as a service in NanoSparql server. I will try it immediately >>> >>> >>> Thanks >>> Joakim >>> >>> On Feb 16, 2016, at 8:09 AM, Jeremy J Carroll <jj...@sy...> wrote: >>> >>> See https://wiki.blazegraph.com/wiki/index.php/REST_API#Bulk_Data_Load >>> >>> >>> >>> That looks very interesting: >>> >>> I read: >>> >>> "Parsing, insert, and removal on the database are now decoupled from the >>> index writes” >>> >>> One behavior we have is that we have small inserts concurrent with other >>> activity (typically but not exclusively read activity). Does the >>> enhanced configurability in 2.0 give us options that may allow us to >>> improve performance of these writes. >>> >>> E.g. this week we have many (millions? at least hundreds of thousands) >>> of such small writes (10 - 100 quads) and we also are trying to delete 25 >>> million quads using about 100 delete/insert requests (that I take to be not >>> impacted by this change). I am currently suggesting we should do one or the >>> other at any one time, and not try to mix: but frankly I am guessing, and >>> guessing conservatively. We have to maintain an always-on read >>> performance at the same time. Total store size approx 3billion. >>> >>> [Unfortunately this machine is still a 1.5.3 machine, but for future >>> reference I am trying to have better sense of how to organize such activity] >>> >>> Jeremy >>> >>> >>> >>> >>> >>> On Feb 16, 2016, at 7:55 AM, Bryan Thompson <br...@sy...> wrote: >>> >>> 2.0 includes support for bulk data load with a number of interesting >>> features, including durable queue patterns, folders, etc. See >>> https://wiki.blazegraph.com/wiki/index.php/REST_API#Bulk_Data_Load >>> >>> ---- >>> Bryan Thompson >>> Chief Scientist & Founder >>> Blazegraph >>> e: br...@bl... >>> w: http://blazegraph.com >>> >>> Blazegraph products help to solve the Graph Cache Thrash to achieve >>> large scale processing for graph and predictive analytics. Blazegraph is >>> the creator of the industry’s first GPU-accelerated high-performance >>> database for large graphs, has been named as one of the “10 Companies >>> and Technologies to Watch in 2016” >>> <http://insideanalysis.com/2016/01/20535/>. >>> >>> Blazegraph Database <https://www.blazegraph.com/> is our ultra-high >>> performance graph database that supports both RDF/SPARQL and >>> Tinkerpop/Blueprints APIs. Blazegraph GPU >>> <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS >>> <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive >>> new technologies that use GPUs to enable extreme scaling that is thousands >>> of times faster and 40 times more affordable than CPU-based solutions. >>> >>> CONFIDENTIALITY NOTICE: This email and its contents and attachments >>> are for the sole use of the intended recipient(s) and are confidential or >>> proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, >>> disclosure, dissemination or copying of this email or its contents or >>> attachments is prohibited. If you have received this communication in >>> error, please notify the sender by reply email and permanently delete all >>> copies of the email and its contents and attachments. >>> >>> On Tue, Feb 16, 2016 at 10:40 AM, Jeremy J Carroll <jj...@sy...> >>> wrote: >>> >>>> >>>> >>>> On Feb 15, 2016, at 10:42 PM, Joakim Soderberg < >>>> joa...@bl...> wrote: >>>> >>>> Has anyone succeeded to load a folder of .nt files? I can load one by >>>> one: >>>> >>>> LOAD <file:///mydata/dbpedia2015/core/amsterdammuseum_links.nt> INTO >>>> GRAPH <http://dbpedia2015> >>>> >>>> But it doesn’t like a folder name >>>> LOAD <file:///mydata/dbpedia2015/core/> INTO GRAPH <http://dbpedia2015> >>>> >>>> >>>> >>>> That is correct. If you look at the spec for LOAD: >>>> https://www.w3.org/TR/sparql11-update/#load >>>> then it takes an IRI as where you are loading from, and the concept of >>>> folder is simply not applicable. >>>> A few schemes such as file: and ftp: may have such a notion, but the >>>> operation you are looking for is local to your machine on the client and >>>> you should probably implement it yourself. >>>> >>>> In particular, do you want each file loaded into a different graph or >>>> the same graph: probably best for you to make up your own mind. >>>> >>>> I have had success loading trig files into multiple graphs, using a >>>> simple POST to the endpoint. >>>> >>>> >>>> Jeremy >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance >>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >>>> Monitor end-to-end web transactions and take corrective actions now >>>> Troubleshoot faster and improve end-user experience. Signup Now! >>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 >>>> _______________________________________________ >>>> Bigdata-developers mailing list >>>> Big...@li... >>>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers >>>> >>>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Site24x7 APM Insight: Get Deep Visibility into Application Performance >>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >>> Monitor end-to-end web transactions and take corrective actions now >>> Troubleshoot faster and improve end-user experience. Signup Now! >>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 >>> _______________________________________________ >>> Bigdata-developers mailing list >>> Big...@li... >>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers >>> >>> >> >> >> -- >> _______________ >> Brad Bebee >> CEO >> Blazegraph >> e: be...@bl... >> m: 202.642.7961 >> w: www.blazegraph.com >> >> Blazegraph products help to solve the Graph Cache Thrash to achieve large >> scale processing for graph and predictive analytics. Blazegraph is the >> creator of the industry’s first GPU-accelerated high-performance database >> for large graphs, has been named as one of the “10 Companies and >> Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. >> >> >> Blazegraph Database <https://www.blazegraph.com/> is our ultra-high >> performance graph database that supports both RDF/SPARQL and >> Tinkerpop/Blueprints APIs. Blazegraph GPU >> <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS >> <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive >> new technologies that use GPUs to enable extreme scaling that is thousands >> of times faster and 40 times more affordable than CPU-based solutions. >> >> CONFIDENTIALITY NOTICE: This email and its contents and attachments are >> for the sole use of the intended recipient(s) and are confidential or >> proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, >> disclosure, dissemination or copying of this email or its contents or >> attachments is prohibited. If you have received this communication in >> error, please notify the sender by reply email and permanently delete all >> copies of the email and its contents and attachments. >> >> ------------------------------------------------------------------------------ >> Site24x7 APM Insight: Get Deep Visibility into Application Performance >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >> Monitor end-to-end web transactions and take corrective actions now >> Troubleshoot faster and improve end-user experience. Signup Now! >> >> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________ >> Bigdata-developers mailing list >> Big...@li... >> https://lists.sourceforge.net/lists/listinfo/bigdata-developers >> >> >> >> > > > -- > _______________ > Brad Bebee > CEO > Blazegraph > e: be...@bl... > m: 202.642.7961 > w: www.blazegraph.com > > Blazegraph products help to solve the Graph Cache Thrash to achieve large > scale processing for graph and predictive analytics. Blazegraph is the > creator of the industry’s first GPU-accelerated high-performance database > for large graphs, has been named as one of the “10 Companies and > Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. > > > Blazegraph Database <https://www.blazegraph.com/> is our ultra-high > performance graph database that supports both RDF/SPARQL and > Tinkerpop/Blueprints APIs. Blazegraph GPU > <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS > <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive new > technologies that use GPUs to enable extreme scaling that is thousands of > times faster and 40 times more affordable than CPU-based solutions. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, > disclosure, dissemination or copying of this email or its contents or > attachments is prohibited. If you have received this communication in > error, please notify the sender by reply email and permanently delete all > copies of the email and its contents and attachments. > > > > > ------------------------------------------------------------------------------ > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > |