Re: [Bigdata-developers] [Bigdata-commit] Bulk loading into bigdata

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Dear Bryan,

Just to get an estimate can you please help me understand how much
large dataset will need good hardware. It will be great if you can
help me quantify as I need to tell my manager the amount of resources
I'll need for good performance (like is 10GB large or is 100GB large)?

Also I intend to store the unused databases on NAS/SAN. Is there some
way by which I may keep only the used databases on disk and keep the
unused ones on SAN/NAS?

Also I have loaded Uniprot in reified form triples form i.e.
<A> rdf:type rdf:statement.
<A> rdf:subject <B>.
<A> rdf:predicate <C>.
<A> rdf:object <D>.

Now I am not getting how should I query I: RDF or SPARQL queries of the form:
select ?a?b?c where{?d rdf:subject ?a. ?d rdf:predicate ?b. ?d rdf:object ?c}

Thanks a lot for your help thus far

Thanks & Regards,
Jyoti

On Tue, Oct 14, 2014 at 3:31 AM, Bryan Thompson <br...@sy...> wrote:
> We do not break out the query optimizer costs vs the query execution costs.
> The NSS explain view provides a detailed breakdown of the query runtime
> costs, but does not show the optimizer cost component.  The optimizer is
> quite fast and is an overhead mainly for low latency queries.  For complex
> or long running queries the cost disappears into the cost of the query
> evaluation.
>
> These are large data sets.  You need a machine with sufficient resources.
> SSD.  32G RAM+.  8 cores or more.  If you do not have enough hardware you
> will not get a good result.  Fast disk is essential for graph databases.
>
> Thanks,
> Bryan
>
>
> On Monday, October 13, 2014, Rose Beck <ros...@gm...> wrote:
>>
>> Dear Bryan,
>>
>> Thanks for the help thus far. I have a small question: I intend to
>> load different datasets like dbpedia and uniprot into Bigdata. I dont
>> have enough disk space so when I am processing one dataset I intend to
>> keep the other dataset on disk e.g. when I am executing queries on
>> dbpedia I intend to store uniprot database in my hard drive so that I
>> dont have to load Uniprot again and again into bigdata. Is there some
>> way out by which I may achieve the same? I am using bigdata workbench
>> in my browser using http://localhost:9999.
>>
>> Also I need to report the following timings to my company:
>> 1. Query execution time excluding plan generation time.
>> 2. Query execution time + Plan generation time
>> 3. Just query execution time excluding dictionary lookup time.
>>
>> Can you please help me as to from where can I get these two timings in
>> Bigdata.
>>
>> Thanks & Regards,
>> Jyoti
>>
>>
>> On Mon, Oct 6, 2014 at 11:02 PM, Bryan Thompson <br...@sy...> wrote:
>> > Rose,
>> >
>> > The trunk is no longer used for bigdata.
>> >
>> > You can checkout the 1.3.2 release from:
>> > https://svn.code.sf.net/p/bigdata/code/tags/BIGDATA_RELEASE_1_3_2
>> >
>> > The 1.3.x maintenance and development branch is:
>> > https://svn.code.sf.net/p/bigdata/code/branches/BIGDATA_RELEASE_1_3_0
>> >
>> > You can also download bigdata from our web site:
>> > http://www.bigdata.com/download.
>> >
>> > Be sure to given the Java platform sufficient resources if you are
>> > loading
>> > large files (-server, -Xmx16G or better, etc.).  See wiki.bigdata.com
>> > for
>> > various information on how to obtain, build, configure, and use the
>> > bigdata
>> > platform.
>> >
>> > Thanks,
>> > Bryan
>> >
>> >
>> > ----
>> > Bryan Thompson
>> > Chief Scientist & Founder
>> > SYSTAP, LLC
>> > 4501 Tower Road
>> > Greensboro, NC 27410
>> > br...@sy...
>> > http://bigdata.com
>> > http://mapgraph.io
>> >
>> > CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
>> > for
>> > the sole use of the intended recipient(s) and are confidential or
>> > proprietary to SYSTAP. Any unauthorized review, use, disclosure,
>> > dissemination or copying of this email or its contents or attachments is
>> > prohibited. If you have received this communication in error, please
>> > notify
>> > the sender by reply email and permanently delete all copies of the email
>> > and
>> > its contents and attachments.
>> >
>> >
>> > On Mon, Oct 6, 2014 at 1:09 PM, Rose Beck <ros...@gm...> wrote:
>> >>
>> >> Dear Bryan,
>> >>
>> >> I have Uniprot data in turse triple form (ttl) and I want to load it
>> >> in bigdata and I want to query it using bigdata. On the web page
>> >> http://wiki.bigdata.com/wiki/index.php/NanoSparqlServer it is
>> >> mentioned that I should look at the example:  NSSEmbeddedExample.java.
>> >> However when I check out the code:
>> >> svn checkout svn://svn.code.sf.net/p/bigdata/code/trunk bigdata-code
>> >>
>> >> I am unable to find this example.
>> >>
>> >> Can you please help me with this. I am a complete novice at Java
>> >> (Although I work on C/C++ and Python extensively), therefore I am not
>> >> able to understand as to how should I load large datasets into Bigdata
>> >> and how should I query them. Perhaps a step-by-step guide for users
>> >> like me who are not familiar with Java will be of great help.
>> >>
>> >> Thanks & Regards,
>> >> Jyoti
>
>
>
> --
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 27410
> br...@sy...
> http://bigdata.com
> http://mapgraph.io
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are for
> the sole use of the intended recipient(s) and are confidential or
> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments is
> prohibited. If you have received this communication in error, please notify
> the sender by reply email and permanently delete all copies of the email and
> its contents and attachments.
>
>

Re: [Bigdata-developers] [Bigdata-commit] Bulk loading into bigdata

Fast, scalable, robust graph database platform

Re: [Bigdata-developers] [Bigdata-commit] Bulk loading into bigdata