Re: [Bigdata-developers] Problem while loading yago2s in Bigdata

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Dear Bryan,

Just to confirm...
RDR option in update window is disabled when we load a large file (i.e.
specify its path). Is it ok to only specify the namespace as RDR when
loading reified triples?

Also can we query reified triples in Bigdata using SPARQL* or will the
plain old SPARQL with reification work?

Cheers,
Maria
On Fri, Oct 17, 2014 at 4:04 AM, Bryan Thompson <br...@sy...> wrote:

> The database instance needs to be in the mode that supports RDR.  This is
> called the "statement identifiers" mode internally for historical reasons.
> Today the feature is implemented using inlining.  However, the details of
> the implementation mechanism should be invisible.  That is the whole point
> of the RDR model.
>
> The RDR mode will support efficient reified triples, inference (if
> enabled), and triples.  We will be introducing RDR  support for quads but
> it is not yet in the released platform.  The workbench might label the
> option differently.  Choose to create a new namespace.  Then choose the
> mode that supports triples with statement level provenance / RDR /
> statement identifiers.  The default mode for the NSS in the WAR deployment
> is, I believe, quads.
>
> There are two new papers that might be of interest to you linked from
> http://blog.bigdata.com.  They are on RDF*/SPARQL* (a formal treatment of
> RDR) and on a reconciliation of RDF and property graphs based on the RDR
> model.  Many thanks to Olaf Hartig for his efforts on these papers.
>
> Thanks,
> Bryan
>
>
> On Thursday, October 16, 2014, Maria Jackson <mar...@gm...>
> wrote:
>
>> Dear Bryan,
>>
>> I need to load reified triples. The option of RDR in update window is
>> disabled when we specify the path. By this I mean how do we tell BigData
>> that we are entering reified triples during update. I am entering reified
>> triples in BigData according to W3C standard (
>> http://www.w3.org/TR/2004/REC-rdf-primer-20040210/#reification).  Also
>> do we also need to specify the present namespace to be RDR instead of
>> triples.
>>
>> Another question that I have is how can we query reified triples using
>> SPARQL. Should I use the query reification vocabulary defined here
>> http://www.w3.org/2001/sw/DataAccess/rq23/#queryReification or do you
>> have your own vocabulary for querying RDR data (containing reified triples).
>>
>>
>> Cheers,
>> Maria.
>>
>> On Thu, Oct 16, 2014 at 3:55 PM, Bryan Thompson <br...@sy...> wrote:
>>
>>> There is a distinction between the workbench (JavaScript in the browser)
>>> and the database process (java running inside of a servlet container in
>>> this case).  In anomalous conditions the workbench might not correctly
>>> track what is happening on the database side.  I suggest that you check the
>>> database log output and see what messages were generated during that time.
>>> I suspect that you might have something like a "GC Overhead limit
>>> exceeded", which is a type of out of memory exception for java where too
>>> much of the total time is spent in garbage collection.  Or perhaps some
>>> other root cause that abnormally terminated the update request in a manner
>>> that the workbench was unable to identify.
>>>
>>> If the update failed, then the database will not contain any triples.
>>> If you are trying to load a very large dataset it may make sense to upload
>>> the data in a series of smaller chunks.
>>>
>>> There is a "monitor" option that will show you the status of the update
>>> requests as they are being processed.  When loading large files it will
>>> echo back on the HTTP connection a summary of the number of statements
>>> loaded over time during the load.  This will provide you with better
>>> feedback.
>>>
>>> But I think that you have an error condition on the server that has
>>> halted the load.
>>>
>>> Thansk,
>>> Bryan
>>>
>>>
>>> On Wednesday, October 15, 2014, Maria Jackson <
>>> mar...@gm...> wrote:
>>>
>>>> Dear All,
>>>>
>>>> I am trying to load yago2s 18.5GB (
>>>> http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/
>>>> <https://contactmonkey.com/api/v1/tracker?cm_session=4d54369b-9f5b-4f3b-ae2d-5c05ba2939a0&cm_type=link&cm_link=36eb659b-7a36-459f-95da-e6d711aec4d0&cm_destination=http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/>)
>>>> in Bigdata. I downloaded bigdata from  http://www.bigdata.com/download
>>>> <https://contactmonkey.com/api/v1/tracker?cm_session=4d54369b-9f5b-4f3b-ae2d-5c05ba2939a0&cm_type=link&cm_link=0b0bc5d8-f7fe-46b3-b416-31eb502201c4&cm_destination=http://www.bigdata.com/download> and
>>>> I am using Bigdata workbench via http://localhost:9999.
>>>>
>>>> I am loading yago2s in BigData's default namespace "kb". I am loading
>>>> yago2s using update by specifying the file path there. While Bigdata is
>>>> loading yago I notice that it consumes a significant amount of CPU and RAM
>>>> for 4-5 hours, but after that it stops using RAM. But my dilemma is that
>>>> BigData workbench still keeps on showing "Running update.." although
>>>> BigData does not consume any RAM or CPU for the next 48 hours or so (In
>>>> fact it keeps showing "Running update.." until I kill the process). Can you
>>>> please suggest as to where am I going wrong as after killing the process
>>>> BigData is not able to retrieve any tuples (and shows 0 results even for
>>>> the query select ?a?b?c where{?a ?b ?c})
>>>>
>>>>
>>>> Also I am using BigData on a server with 16 cores and 64 GB RAM?
>>>>
>>>> Any help in this regard will be deeply appreciated.
>>>>
>>>> Cheers,
>>>> Maria
>>>>
>>>
>>>
>>> --
>>> ----
>>> Bryan Thompson
>>> Chief Scientist & Founder
>>> SYSTAP, LLC
>>> 4501 Tower Road
>>> Greensboro, NC 27410
>>> br...@sy...
>>> http://bigdata.com
>>> http://mapgraph.io
>>>
>>> CONFIDENTIALITY NOTICE:  This email and its contents and attachments
>>> are for the sole use of the intended recipient(s) and are confidential or
>>> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
>>> dissemination or copying of this email or its contents or attachments is
>>> prohibited. If you have received this communication in error, please notify
>>> the sender by reply email and permanently delete all copies of the email
>>> and its contents and attachments.
>>>
>>>
>>
>
> --
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 27410
> br...@sy...
> http://bigdata.com
> http://mapgraph.io
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
> for the sole use of the intended recipient(s) and are confidential or
> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments is
> prohibited. If you have received this communication in error, please notify
> the sender by reply email and permanently delete all copies of the email
> and its contents and attachments.
>
>

Re: [Bigdata-developers] Problem while loading yago2s in Bigdata

Fast, scalable, robust graph database platform

Re: [Bigdata-developers] Problem while loading yago2s in Bigdata