Menu

A few starter questions

Help
salex89
2014-08-06
2014-08-08
  • salex89

    salex89 - 2014-08-06

    Hello,

    I know there is a lot of documentation about the bigdata database, but I am struggling to catch up some things.

    First, I've downloaded the bigdata.war file, and as I understood from the wiki that is the easiest way to start. I've set up tomcat, deployed the war and it is working. However, I cannot use it with the OpenRDF Sesame API. I used this dependency

            <dependency>
                <groupId>com.bigdata</groupId>
                <artifactId>bigdata</artifactId>
                <version>1.3.1</version>
            </dependency>
    

    When I try to call the connection.setAutoCommit(false), I get this exception:

    java.lang.IllegalArgumentException: only auto-commit is currently supported
    

    When I comment it out, the line :
    ValueFactory f = con.getValueFactory();

    Throws:
    java.lang.UnsupportedOperationException at com.bigdata.rdf.sail.remote.BigdataSailRemoteRepositoryConnection.getValueFactory(Bigdata SailRemoteRepositoryConnection.java:1097)

    What am I doing wrong?

    Secondly, I'm not clear with the scaling out options. It looks like the wiki is talking about different things than the ones I am looking at. I completely do not understand this chapter: http://wiki.bigdata.com/wiki/index.php/NanoSparqlServer#Scale-out_.28cluster_.2F_federation.29

    Is it possible to scale out while still using OpenRDF Sesame? I see some load balancers here: http://wiki.bigdata.com/wiki/index.php/Main_Page, but I cannot put them into context.

    As far as I see it, the cluster implementation needs to be downloaded in code, than compiled? http://wiki.bigdata.com/wiki/index.php/SingleMachineCluster

    Excuse me if my questions sounds a bit chaotic, but I'm trying to wrap my head around the terminology and organization of the database.

    P.S the code can be viewed here: http://pastebin.com/ziAzM9gV

     

    Last edit: salex89 2014-08-06
    • Mike Personick

      Mike Personick - 2014-08-07

      Which version of Sesame are you using? Bigdata 1.3.1 is still on Sesame
      2.6. We are close to releasing Bigdata 1.3.2, which will support Sesame
      2.7.


      Mike Personick
      Managing Partner
      Systap, LLC
      www.systap.com
      801-243-3678
      skype: mike.personick

      On Wed, Aug 6, 2014 at 5:38 PM, salex89 salex89@users.sf.net wrote:

      Hello,

      I know there is a lot of documentation about the bigdata database, but I
      am struggling to catch up some things.

      First, I've downloaded the bigdata.war file, and as I understood from the
      wiki that is the easiest way to start. I've set up tomcat, deployed the war
      and it is working. However, I cannot use it with the OpenRDF Sesame API. I
      used this dependency

          <dependency>
              <groupId>com.bigdata</groupId>
              <artifactId>bigdata</artifactId>
              <version>1.3.1</version>
          </dependency>
      

      When I try to call the connection.setAutoCommit(false), I get this
      exception:

      java.lang.IllegalArgumentException: only auto-commit is currently supported

      When I comment it out, the line :
      ValueFactory f = con.getValueFactory();

      Throws:
      java.lang.UnsupportedOperationException at
      com.bigdata.rdf.sail.remote.BigdataSailRemoteRepositoryConnection.getValueFactory(Bigdata
      SailRemoteRepositoryConnection.java:1097)

      What am I doing wrong?

      Secondly, I'm not clear with the scaling out options. It looks like the
      wiki is talking about different things than the ones I am looking at. I
      completely do not understand this chapter:
      http://wiki.bigdata.com/wiki/index.php/NanoSparqlServer#Scale-out_.28cluster_.2F_federation.29

      Is it possible to scale out while still using OpenRDF Sesame? I see some
      load balancers here: http://wiki.bigdata.com/wiki/index.php/Main_Page,
      but I cannot put them into context.

      As far as I see it, the cluster implementation needs to be downloaded in
      code, than compiled?
      http://wiki.bigdata.com/wiki/index.php/SingleMachineCluster

      Excuse me if my questions sounds a bit chaotic, but I'm trying to wrap my
      head around the terminology and organization of the database.


      A few starter questions
      https://sourceforge.net/p/bigdata/discussion/676946/thread/789cf7b9/?limit=25#1f6b


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/bigdata/discussion/676946/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
      • salex89

        salex89 - 2014-08-07

        Hi,

        I'm using 2.6.10. Here is the complete dependencies.

        <dependencies>
            <dependency>
                <groupId>com.bigdata</groupId>
                <artifactId>bigdata</artifactId>
                <version>1.3.1</version>
                <type>jar</type>
            </dependency>
            <dependency>
                <groupId>org.openrdf.sesame</groupId>
                <artifactId>sesame-runtime</artifactId>
                <version>2.6.10</version>
            </dependency>
        </dependencies>
        
         
        • Mike Personick

          Mike Personick - 2014-08-07

          Ok I think I see the problem. The BigdataSailRemoteRepositoryConnection is
          a client-side wrapper around our RemoteRepository class. All write
          operations against this class are immediately flushed to the server, hence
          there is no way to turn "auto-commit" off. This class is not suitable for
          bulk load operations. If you are trying to bulk load data it's always
          better to do it directly against the server. Otherwise you will be posting
          large files over HTTP.

          There is also no concept of a client-side ValueFactory. You can just use a
          simple ValueFactoryImpl (the default Sesame implementation).

          You can also use an embedded bigdata instance if you'd rather just run the
          database in the same JVM instead of client/server over HTTP. To get
          started with an embedded instance I'd recommend checking out the class
          com.bigdata.rdf.sail.BigdataSailFactory.

          For high-throughput data loading you should use the data loader:

          com.bigdata.rdf.store.DataLoader


          Mike Personick
          Managing Partner
          Systap, LLC
          www.systap.com
          801-243-3678
          skype: mike.personick

          On Thu, Aug 7, 2014 at 11:41 AM, salex89 salex89@users.sf.net wrote:

          Hi,

          I'm using 2.6.10. Here is the complete dependencies.

          <dependencies>

          <dependency>
              <groupId>com.bigdata</groupId>
              <artifactId>bigdata</artifactId>
              <version>1.3.1</version>
          
              <type>jar</type>
          </dependency>
          <dependency>
              <groupId>org.openrdf.sesame</groupId>
              <artifactId>sesame-runtime</artifactId>
              <version>2.6.10</version>
          </dependency></dependencies>
          

          A few starter questions
          https://sourceforge.net/p/bigdata/discussion/676946/thread/789cf7b9/?limit=25#1f6b/1a4f/415a


          Sent from sourceforge.net because you indicated interest in
          https://sourceforge.net/p/bigdata/discussion/676946/

          To unsubscribe from further messages, please visit
          https://sourceforge.net/auth/subscriptions/

           
          • Bryan Thompson

            Bryan Thompson - 2014-08-07

            Actually. you can use either the SPARQL "LOAD" operator or the INSERT by
            URLs REST API method [1] for efficient bulk loading against the database
            server. The only difference between these options and the DataLoader is
            that the latter defaults to a larger statement buffer and hence does
            incremental commits in larger batches. You can override that behavior
            through BigdataSail.Options:

                /**
            
                 * The capacity of the statement buffer used to absorb writes.
            
                 * If this capacity is exceeded, then an incremental flush will
            
                 * push assertions and/or retractions to the statement indices.
            
                 *
            
                 * *@see* #DEFAULT_BUFFER_CAPACITY
            
                 */
            
                *public* *static* *final* String *BUFFER_CAPACITY* = BigdataSail.
            

            class

                        .getPackage().getName()
            
                        + ".bufferCapacity";
            
                *public* *static* *final* String *DEFAULT_BUFFER_CAPACITY* = "10000"
            

            ;

            The corresponding default for the DataLoader is set by:

                /**
            
                 * Optional property specifying the capacity of the
            
                 * {@link StatementBuffer} (default is {@value
            

            DEFAULT_BUFFER_CAPACITY}

                 * statements).
            
                 */
            
                String *BUFFER_CAPACITY* = DataLoader.*class*.getName()+
            

            ".bufferCapacity";

                String *DEFAULT_BUFFER_CAPACITY* = "100000";
            

            Thanks,
            Bryan

            [1]
            http://wiki.bigdata.com/wiki/index.php/NanoSparqlServer#INSERT_RDF_.28POST_with_URLs.29


            Bryan Thompson
            Chief Scientist & Founder
            SYSTAP, LLC
            4501 Tower Road
            Greensboro, NC 27410
            bryan@systap.com
            http://bigdata.com
            http://mapgraph.io

            CONFIDENTIALITY NOTICE: This email and its contents and attachments are
            for the sole use of the intended recipient(s) and are confidential or
            proprietary to SYSTAP. Any unauthorized review, use, disclosure,
            dissemination or copying of this email or its contents or attachments is
            prohibited. If you have received this communication in error, please notify
            the sender by reply email and permanently delete all copies of the email
            and its contents and attachments.

            On Thu, Aug 7, 2014 at 6:32 AM, Mike Personick mrpersonick@users.sf.net
            wrote:

            Ok I think I see the problem. The BigdataSailRemoteRepositoryConnection is
            a client-side wrapper around our RemoteRepository class. All write
            operations against this class are immediately flushed to the server, hence
            there is no way to turn "auto-commit" off. This class is not suitable for
            bulk load operations. If you are trying to bulk load data it's always
            better to do it directly against the server. Otherwise you will be posting
            large files over HTTP.

            There is also no concept of a client-side ValueFactory. You can just use a
            simple ValueFactoryImpl (the default Sesame implementation).

            You can also use an embedded bigdata instance if you'd rather just run the
            database in the same JVM instead of client/server over HTTP. To get
            started with an embedded instance I'd recommend checking out the class
            com.bigdata.rdf.sail.BigdataSailFactory.

            For high-throughput data loading you should use the data loader:

            com.bigdata.rdf.store.DataLoader

            Mike Personick
            Managing Partner
            Systap, LLC
            www.systap.com
            801-243-3678
            skype: mike.personick

            On Thu, Aug 7, 2014 at 11:41 AM, salex89 salex89@users.sf.net wrote:

            Hi,

            I'm using 2.6.10. Here is the complete dependencies.

            <dependencies>

            <dependency>
            <groupId>com.bigdata</groupId>
            <artifactId>bigdata</artifactId>
            <version>1.3.1</version>

            <type>jar</type></dependency><dependency>
            <groupId>org.openrdf.sesame</groupId>
            <artifactId>sesame-runtime</artifactId>
            <version>2.6.10</version></dependency></dependencies>
            

            A few starter questions

            https://sourceforge.net/p/bigdata/discussion/676946/thread/789cf7b9/?limit=25#1f6b/1a4f/415a

            Sent from sourceforge.net because you indicated interest in
            https://sourceforge.net/p/bigdata/discussion/676946/

            To unsubscribe from further messages, please visit
            https://sourceforge.net/auth/subscriptions/


            A few starter questions
            http://sourceforge.net/p/bigdata/discussion/676946/thread/789cf7b9/?limit=25#1f6b/1a4f/415a/db07


            Sent from sourceforge.net because you indicated interest in
            https://sourceforge.net/p/bigdata/discussion/676946/

            To unsubscribe from further messages, please visit
            https://sourceforge.net/auth/subscriptions/

             
    • Bryan Thompson

      Bryan Thompson - 2014-08-07

      Bigdata has several modes:

      • Embedded (Journal + BigdataSail)
      • Server (NanoSparqlServer with Journal as the backend)
      • Highly Available Server (NanoSparqlServer with HAJournal as the backend
        providing a quorum based replication cluster with automatic failover,
        self-healing, online backups, etc.)
      • Bigdata Federation (horizontal scaling with dynamic sharding).

      Thanks,
      Bryan


      Bryan Thompson
      Chief Scientist & Founder
      SYSTAP, LLC
      4501 Tower Road
      Greensboro, NC 27410
      bryan@systap.com
      http://bigdata.com
      http://mapgraph.io

      CONFIDENTIALITY NOTICE: This email and its contents and attachments are
      for the sole use of the intended recipient(s) and are confidential or
      proprietary to SYSTAP. Any unauthorized review, use, disclosure,
      dissemination or copying of this email or its contents or attachments is
      prohibited. If you have received this communication in error, please notify
      the sender by reply email and permanently delete all copies of the email
      and its contents and attachments.

      On Wed, Aug 6, 2014 at 10:38 AM, salex89 salex89@users.sf.net wrote:

      Hello,

      I know there is a lot of documentation about the bigdata database, but I
      am struggling to catch up some things.

      First, I've downloaded the bigdata.war file, and as I understood from the
      wiki that is the easiest way to start. I've set up tomcat, deployed the war
      and it is working. However, I cannot use it with the OpenRDF Sesame API. I
      used this dependency

          <dependency>
              <groupId>com.bigdata</groupId>
              <artifactId>bigdata</artifactId>
              <version>1.3.1</version>
          </dependency>
      

      When I try to call the connection.setAutoCommit(false), I get this
      exception:

      java.lang.IllegalArgumentException: only auto-commit is currently supported

      When I comment it out, the line :
      ValueFactory f = con.getValueFactory();

      Throws:
      java.lang.UnsupportedOperationException at
      com.bigdata.rdf.sail.remote.BigdataSailRemoteRepositoryConnection.getValueFactory(Bigdata
      SailRemoteRepositoryConnection.java:1097)

      What am I doing wrong?

      Secondly, I'm not clear with the scaling out options. It looks like the
      wiki is talking about different things than the ones I am looking at. I
      completely do not understand this chapter:
      http://wiki.bigdata.com/wiki/index.php/NanoSparqlServer#Scale-out_.28cluster_.2F_federation.29

      Is it possible to scale out while still using OpenRDF Sesame? I see some
      load balancers here: http://wiki.bigdata.com/wiki/index.php/Main_Page,
      but I cannot put them into context.

      As far as I see it, the cluster implementation needs to be downloaded in
      code, than compiled?
      http://wiki.bigdata.com/wiki/index.php/SingleMachineCluster

      Excuse me if my questions sounds a bit chaotic, but I'm trying to wrap my
      head around the terminology and organization of the database.


      A few starter questions
      https://sourceforge.net/p/bigdata/discussion/676946/thread/789cf7b9/?limit=25#1f6b


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/bigdata/discussion/676946/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • salex89

    salex89 - 2014-08-08

    Thank you all for this, it is working.

    However I have some more questions related to our use case. Our company deals a lot with time series data and sensor related data. In the past we have been using other RDF/Semantic databases (it would be better not to say which) and we are not too satisfied with them. Its often an issue of stability or scaling or plain lack of features.

    As you may guess we have a "big data" use case, and although we are not storing everything in a triple store, there is a part of load that falls on it. So, an embedded or single server solution is not really future proof for us.

    What kind of deployment would you propose? Is this database applicable in such a scenario, since we would need sometimes a larger amount of data programmatically loaded (not from files or something)? We could continue this discussion via email if you would prefer.

    Best regards

     
    • Bryan Thompson

      Bryan Thompson - 2014-08-08

      Let's take the discussion to an email thread. I am bryan@systap.com. I can add the others to the thread.

      Thanks,
      Bryan

      On Aug 8, 2014, at 6:10 AM, "salex89" salex89@users.sf.net wrote:

      Thank you all for this, it is working.

      However I have some more questions related to our use case. Our company deals a lot with time series data and sensor related data. In the past we have been using other RDF/Semantic databases (it would be better not to say which) and we are not too satisfied with them. Its often an issue of stability or scaling or plain lack of features.

      As you may guess we have a "big data" use case, and although we are not storing everything in a triple store, there is a part of load that falls on it. So, an embedded or single server solution is not really future proof for us.

      What kind of deployment would you propose? Is this database applicable in such a scenario, since we would need sometimes a larger amount of data programmatically loaded (not from files or something)? We could continue this discussion via email if you would prefer.

      Best regards

      A few starter questions

      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/bigdata/discussion/676946/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       

Log in to post a comment.