Blazegraph (powered by bigdata) / Discussion / Help: A few starter questions

salex89 - 2014-08-06

Hello,

I know there is a lot of documentation about the bigdata database, but I am struggling to catch up some things.

First, I've downloaded the bigdata.war file, and as I understood from the wiki that is the easiest way to start. I've set up tomcat, deployed the war and it is working. However, I cannot use it with the OpenRDF Sesame API. I used this dependency

<dependency> <groupId>com.bigdata</groupId> <artifactId>bigdata</artifactId> <version>1.3.1</version> </dependency>

When I try to call the connection.setAutoCommit(false), I get this exception:

java.lang.IllegalArgumentException: only auto-commit is currently supported

When I comment it out, the line :
ValueFactory f = con.getValueFactory();

Throws:
java.lang.UnsupportedOperationException at com.bigdata.rdf.sail.remote.BigdataSailRemoteRepositoryConnection.getValueFactory(Bigdata SailRemoteRepositoryConnection.java:1097)

What am I doing wrong?

Secondly, I'm not clear with the scaling out options. It looks like the wiki is talking about different things than the ones I am looking at. I completely do not understand this chapter: http://wiki.bigdata.com/wiki/index.php/NanoSparqlServer#Scale-out_.28cluster_.2F_federation.29

Is it possible to scale out while still using OpenRDF Sesame? I see some load balancers here: http://wiki.bigdata.com/wiki/index.php/Main_Page, but I cannot put them into context.

As far as I see it, the cluster implementation needs to be downloaded in code, than compiled? http://wiki.bigdata.com/wiki/index.php/SingleMachineCluster

Excuse me if my questions sounds a bit chaotic, but I'm trying to wrap my head around the terminology and organization of the database.

P.S the code can be viewed here: http://pastebin.com/ziAzM9gV

Last edit: salex89 2014-08-06
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Mike Personick - 2014-08-07
  
  Which version of Sesame are you using? Bigdata 1.3.1 is still on Sesame
  2.6. We are close to releasing Bigdata 1.3.2, which will support Sesame
  2.7.
  
  Mike Personick
  Managing Partner
  Systap, LLC
  www.systap.com
  801-243-3678
  skype: mike.personick
  
  On Wed, Aug 6, 2014 at 5:38 PM, salex89 salex89@users.sf.net wrote:
  
  Hello,
  
  I know there is a lot of documentation about the bigdata database, but I
  am struggling to catch up some things.
  
  First, I've downloaded the bigdata.war file, and as I understood from the
  wiki that is the easiest way to start. I've set up tomcat, deployed the war
  and it is working. However, I cannot use it with the OpenRDF Sesame API. I
  used this dependency
  
  <dependency> <groupId>com.bigdata</groupId> <artifactId>bigdata</artifactId> <version>1.3.1</version> </dependency>
  
  When I try to call the connection.setAutoCommit(false), I get this
  exception:
  
  java.lang.IllegalArgumentException: only auto-commit is currently supported
  
  When I comment it out, the line :
  ValueFactory f = con.getValueFactory();
  
  Throws:
  java.lang.UnsupportedOperationException at
  com.bigdata.rdf.sail.remote.BigdataSailRemoteRepositoryConnection.getValueFactory(Bigdata
  SailRemoteRepositoryConnection.java:1097)
  
  What am I doing wrong?
  
  Secondly, I'm not clear with the scaling out options. It looks like the
  wiki is talking about different things than the ones I am looking at. I
  completely do not understand this chapter:
  http://wiki.bigdata.com/wiki/index.php/NanoSparqlServer#Scale-out_.28cluster_.2F_federation.29
  
  Is it possible to scale out while still using OpenRDF Sesame? I see some
  load balancers here: http://wiki.bigdata.com/wiki/index.php/Main_Page,
  but I cannot put them into context.
  
  As far as I see it, the cluster implementation needs to be downloaded in
  code, than compiled?
  http://wiki.bigdata.com/wiki/index.php/SingleMachineCluster
  
  Excuse me if my questions sounds a bit chaotic, but I'm trying to wrap my
  head around the terminology and organization of the database.
  
  A few starter questions
  https://sourceforge.net/p/bigdata/discussion/676946/thread/789cf7b9/?limit=25#1f6b
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/bigdata/discussion/676946/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - salex89 - 2014-08-07
    
    Hi,
    
    I'm using 2.6.10. Here is the complete dependencies.
    
    <dependencies> <dependency> <groupId>com.bigdata</groupId> <artifactId>bigdata</artifactId> <version>1.3.1</version> <type>jar</type> </dependency> <dependency> <groupId>org.openrdf.sesame</groupId> <artifactId>sesame-runtime</artifactId> <version>2.6.10</version> </dependency> </dependencies>
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Mike Personick - 2014-08-07
      
      Ok I think I see the problem. The BigdataSailRemoteRepositoryConnection is
      a client-side wrapper around our RemoteRepository class. All write
      operations against this class are immediately flushed to the server, hence
      there is no way to turn "auto-commit" off. This class is not suitable for
      bulk load operations. If you are trying to bulk load data it's always
      better to do it directly against the server. Otherwise you will be posting
      large files over HTTP.
      
      There is also no concept of a client-side ValueFactory. You can just use a
      simple ValueFactoryImpl (the default Sesame implementation).
      
      You can also use an embedded bigdata instance if you'd rather just run the
      database in the same JVM instead of client/server over HTTP. To get
      started with an embedded instance I'd recommend checking out the class
      com.bigdata.rdf.sail.BigdataSailFactory.
      
      For high-throughput data loading you should use the data loader:
      
      com.bigdata.rdf.store.DataLoader
      
      Mike Personick
      Managing Partner
      Systap, LLC
      www.systap.com
      801-243-3678
      skype: mike.personick
      
      On Thu, Aug 7, 2014 at 11:41 AM, salex89 salex89@users.sf.net wrote:
      
      Hi,
      
      I'm using 2.6.10. Here is the complete dependencies.
      
      <dependencies>
      
      <dependency> <groupId>com.bigdata</groupId> <artifactId>bigdata</artifactId> <version>1.3.1</version> <type>jar</type> </dependency> <dependency> <groupId>org.openrdf.sesame</groupId> <artifactId>sesame-runtime</artifactId> <version>2.6.10</version> </dependency></dependencies>
      
      A few starter questions
      https://sourceforge.net/p/bigdata/discussion/676946/thread/789cf7b9/?limit=25#1f6b/1a4f/415a
      
      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/bigdata/discussion/676946/
      
      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Bryan Thompson - 2014-08-07
        
        Actually. you can use either the SPARQL "LOAD" operator or the INSERT by
        URLs REST API method [1] for efficient bulk loading against the database
        server. The only difference between these options and the DataLoader is
        that the latter defaults to a larger statement buffer and hence does
        incremental commits in larger batches. You can override that behavior
        through BigdataSail.Options:
        
        /** * The capacity of the statement buffer used to absorb writes. * If this capacity is exceeded, then an incremental flush will * push assertions and/or retractions to the statement indices. * * *@see* #DEFAULT_BUFFER_CAPACITY */ *public* *static* *final* String *BUFFER_CAPACITY* = BigdataSail.
        
        class
        
        .getPackage().getName() + ".bufferCapacity"; *public* *static* *final* String *DEFAULT_BUFFER_CAPACITY* = "10000"
        
        ;
        
        The corresponding default for the DataLoader is set by:
        
        /** * Optional property specifying the capacity of the * {@link StatementBuffer} (default is {@value
        
        DEFAULT_BUFFER_CAPACITY}
        
        * statements). */ String *BUFFER_CAPACITY* = DataLoader.*class*.getName()+
        
        ".bufferCapacity";
        
        String *DEFAULT_BUFFER_CAPACITY* = "100000";
        
        Thanks,
        Bryan
        
        [1]
        http://wiki.bigdata.com/wiki/index.php/NanoSparqlServer#INSERT_RDF_.28POST_with_URLs.29
        
        Bryan Thompson
        Chief Scientist & Founder
        SYSTAP, LLC
        4501 Tower Road
        Greensboro, NC 27410
        bryan@systap.com
        http://bigdata.com
        http://mapgraph.io
        
        CONFIDENTIALITY NOTICE: This email and its contents and attachments are
        for the sole use of the intended recipient(s) and are confidential or
        proprietary to SYSTAP. Any unauthorized review, use, disclosure,
        dissemination or copying of this email or its contents or attachments is
        prohibited. If you have received this communication in error, please notify
        the sender by reply email and permanently delete all copies of the email
        and its contents and attachments.
        
        On Thu, Aug 7, 2014 at 6:32 AM, Mike Personick mrpersonick@users.sf.net
        wrote:
        
        Ok I think I see the problem. The BigdataSailRemoteRepositoryConnection is
        a client-side wrapper around our RemoteRepository class. All write
        operations against this class are immediately flushed to the server, hence
        there is no way to turn "auto-commit" off. This class is not suitable for
        bulk load operations. If you are trying to bulk load data it's always
        better to do it directly against the server. Otherwise you will be posting
        large files over HTTP.
        
        There is also no concept of a client-side ValueFactory. You can just use a
        simple ValueFactoryImpl (the default Sesame implementation).
        
        You can also use an embedded bigdata instance if you'd rather just run the
        database in the same JVM instead of client/server over HTTP. To get
        started with an embedded instance I'd recommend checking out the class
        com.bigdata.rdf.sail.BigdataSailFactory.
        
        For high-throughput data loading you should use the data loader:
        
        com.bigdata.rdf.store.DataLoader
        
        Mike Personick
        Managing Partner
        Systap, LLC
        www.systap.com
        801-243-3678
        skype: mike.personick
        
        On Thu, Aug 7, 2014 at 11:41 AM, salex89 salex89@users.sf.net wrote:
        
        Hi,
        
        I'm using 2.6.10. Here is the complete dependencies.
        
        <dependencies>
        
        <dependency>
        <groupId>com.bigdata</groupId>
        <artifactId>bigdata</artifactId>
        <version>1.3.1</version>
        
        <type>jar</type></dependency><dependency> <groupId>org.openrdf.sesame</groupId> <artifactId>sesame-runtime</artifactId> <version>2.6.10</version></dependency></dependencies>
        
        A few starter questions
        
        https://sourceforge.net/p/bigdata/discussion/676946/thread/789cf7b9/?limit=25#1f6b/1a4f/415a
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/bigdata/discussion/676946/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        A few starter questions
        http://sourceforge.net/p/bigdata/discussion/676946/thread/789cf7b9/?limit=25#1f6b/1a4f/415a/db07
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/bigdata/discussion/676946/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bryan Thompson - 2014-08-07
  
  Bigdata has several modes:
  
  Embedded (Journal + BigdataSail)
  
  Server (NanoSparqlServer with Journal as the backend)
  
  Highly Available Server (NanoSparqlServer with HAJournal as the backend
  providing a quorum based replication cluster with automatic failover,
  self-healing, online backups, etc.)
  
  Bigdata Federation (horizontal scaling with dynamic sharding).
  
  Thanks,
  Bryan
  
  Bryan Thompson
  Chief Scientist & Founder
  SYSTAP, LLC
  4501 Tower Road
  Greensboro, NC 27410
  bryan@systap.com
  http://bigdata.com
  http://mapgraph.io
  
  CONFIDENTIALITY NOTICE: This email and its contents and attachments are
  for the sole use of the intended recipient(s) and are confidential or
  proprietary to SYSTAP. Any unauthorized review, use, disclosure,
  dissemination or copying of this email or its contents or attachments is
  prohibited. If you have received this communication in error, please notify
  the sender by reply email and permanently delete all copies of the email
  and its contents and attachments.
  
  On Wed, Aug 6, 2014 at 10:38 AM, salex89 salex89@users.sf.net wrote:
  
  Hello,
  
  I know there is a lot of documentation about the bigdata database, but I
  am struggling to catch up some things.
  
  First, I've downloaded the bigdata.war file, and as I understood from the
  wiki that is the easiest way to start. I've set up tomcat, deployed the war
  and it is working. However, I cannot use it with the OpenRDF Sesame API. I
  used this dependency
  
  <dependency> <groupId>com.bigdata</groupId> <artifactId>bigdata</artifactId> <version>1.3.1</version> </dependency>
  
  When I try to call the connection.setAutoCommit(false), I get this
  exception:
  
  java.lang.IllegalArgumentException: only auto-commit is currently supported
  
  When I comment it out, the line :
  ValueFactory f = con.getValueFactory();
  
  Throws:
  java.lang.UnsupportedOperationException at
  com.bigdata.rdf.sail.remote.BigdataSailRemoteRepositoryConnection.getValueFactory(Bigdata
  SailRemoteRepositoryConnection.java:1097)
  
  What am I doing wrong?
  
  Secondly, I'm not clear with the scaling out options. It looks like the
  wiki is talking about different things than the ones I am looking at. I
  completely do not understand this chapter:
  http://wiki.bigdata.com/wiki/index.php/NanoSparqlServer#Scale-out_.28cluster_.2F_federation.29
  
  Is it possible to scale out while still using OpenRDF Sesame? I see some
  load balancers here: http://wiki.bigdata.com/wiki/index.php/Main_Page,
  but I cannot put them into context.
  
  As far as I see it, the cluster implementation needs to be downloaded in
  code, than compiled?
  http://wiki.bigdata.com/wiki/index.php/SingleMachineCluster
  
  Excuse me if my questions sounds a bit chaotic, but I'm trying to wrap my
  head around the terminology and organization of the database.
  
  A few starter questions
  https://sourceforge.net/p/bigdata/discussion/676946/thread/789cf7b9/?limit=25#1f6b
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/bigdata/discussion/676946/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

salex89 - 2014-08-08

Thank you all for this, it is working.

However I have some more questions related to our use case. Our company deals a lot with time series data and sensor related data. In the past we have been using other RDF/Semantic databases (it would be better not to say which) and we are not too satisfied with them. Its often an issue of stability or scaling or plain lack of features.

As you may guess we have a "big data" use case, and although we are not storing everything in a triple store, there is a part of load that falls on it. So, an embedded or single server solution is not really future proof for us.

What kind of deployment would you propose? Is this database applicable in such a scenario, since we would need sometimes a larger amount of data programmatically loaded (not from files or something)? We could continue this discussion via email if you would prefer.

Best regards

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bryan Thompson - 2014-08-08
  
  Let's take the discussion to an email thread. I am bryan@systap.com. I can add the others to the thread.
  
  Thanks,
  Bryan
  
  On Aug 8, 2014, at 6:10 AM, "salex89" salex89@users.sf.net wrote:
  
  Thank you all for this, it is working.
  
  However I have some more questions related to our use case. Our company deals a lot with time series data and sensor related data. In the past we have been using other RDF/Semantic databases (it would be better not to say which) and we are not too satisfied with them. Its often an issue of stability or scaling or plain lack of features.
  
  As you may guess we have a "big data" use case, and although we are not storing everything in a triple store, there is a part of load that falls on it. So, an embedded or single server solution is not really future proof for us.
  
  What kind of deployment would you propose? Is this database applicable in such a scenario, since we would need sometimes a larger amount of data programmatically loaded (not from files or something)? We could continue this discussion via email if you would prefer.
  
  Best regards
  
  A few starter questions
  
  Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/bigdata/discussion/676946/
  
  To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

A few starter questions

Fast, scalable, robust graph database platform

Forums

Help

A few starter questions

DEFAULT_BUFFER_CAPACITY}

com.bigdata.rdf.store.DataLoader

https://sourceforge.net/p/bigdata/discussion/676946/thread/789cf7b9/?limit=25#1f6b/1a4f/415a

A few starter questions

Fast, scalable, robust graph database platform

Forums

Help

A few starter questions document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

DEFAULT_BUFFER_CAPACITY}

com.bigdata.rdf.store.DataLoader

https://sourceforge.net/p/bigdata/discussion/676946/thread/789cf7b9/?limit=25#1f6b/1a4f/415a

A few starter questions