I know there is a lot of documentation about the bigdata database, but I am struggling to catch up some things.
First, I've downloaded the bigdata.war file, and as I understood from the wiki that is the easiest way to start. I've set up tomcat, deployed the war and it is working. However, I cannot use it with the OpenRDF Sesame API. I used this dependency
When I try to call the connection.setAutoCommit(false), I get this exception:
java.lang.IllegalArgumentException: only auto-commit is currently supported
When I comment it out, the line : ValueFactory f = con.getValueFactory();
Throws: java.lang.UnsupportedOperationException
at com.bigdata.rdf.sail.remote.BigdataSailRemoteRepositoryConnection.getValueFactory(Bigdata SailRemoteRepositoryConnection.java:1097)
I know there is a lot of documentation about the bigdata database, but I
am struggling to catch up some things.
First, I've downloaded the bigdata.war file, and as I understood from the
wiki that is the easiest way to start. I've set up tomcat, deployed the war
and it is working. However, I cannot use it with the OpenRDF Sesame API. I
used this dependency
When I try to call the connection.setAutoCommit(false), I get this
exception:
java.lang.IllegalArgumentException: only auto-commit is currently supported
When I comment it out, the line :
ValueFactory f = con.getValueFactory();
Throws:
java.lang.UnsupportedOperationException at
com.bigdata.rdf.sail.remote.BigdataSailRemoteRepositoryConnection.getValueFactory(Bigdata
SailRemoteRepositoryConnection.java:1097)
Ok I think I see the problem. The BigdataSailRemoteRepositoryConnection is
a client-side wrapper around our RemoteRepository class. All write
operations against this class are immediately flushed to the server, hence
there is no way to turn "auto-commit" off. This class is not suitable for
bulk load operations. If you are trying to bulk load data it's always
better to do it directly against the server. Otherwise you will be posting
large files over HTTP.
There is also no concept of a client-side ValueFactory. You can just use a
simple ValueFactoryImpl (the default Sesame implementation).
You can also use an embedded bigdata instance if you'd rather just run the
database in the same JVM instead of client/server over HTTP. To get
started with an embedded instance I'd recommend checking out the class
com.bigdata.rdf.sail.BigdataSailFactory.
For high-throughput data loading you should use the data loader:
com.bigdata.rdf.store.DataLoader
Mike Personick
Managing Partner
Systap, LLC
www.systap.com
801-243-3678
skype: mike.personick
Actually. you can use either the SPARQL "LOAD" operator or the INSERT by
URLs REST API method [1] for efficient bulk loading against the database
server. The only difference between these options and the DataLoader is
that the latter defaults to a larger statement buffer and hence does
incremental commits in larger batches. You can override that behavior
through BigdataSail.Options:
/**
* The capacity of the statement buffer used to absorb writes.
* If this capacity is exceeded, then an incremental flush will
* push assertions and/or retractions to the statement indices.
*
* *@see* #DEFAULT_BUFFER_CAPACITY
*/
*public* *static* *final* String *BUFFER_CAPACITY* = BigdataSail.
CONFIDENTIALITY NOTICE: This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.
Ok I think I see the problem. The BigdataSailRemoteRepositoryConnection is
a client-side wrapper around our RemoteRepository class. All write
operations against this class are immediately flushed to the server, hence
there is no way to turn "auto-commit" off. This class is not suitable for
bulk load operations. If you are trying to bulk load data it's always
better to do it directly against the server. Otherwise you will be posting
large files over HTTP.
There is also no concept of a client-side ValueFactory. You can just use a
simple ValueFactoryImpl (the default Sesame implementation).
You can also use an embedded bigdata instance if you'd rather just run the
database in the same JVM instead of client/server over HTTP. To get
started with an embedded instance I'd recommend checking out the class
com.bigdata.rdf.sail.BigdataSailFactory.
For high-throughput data loading you should use the data loader:
com.bigdata.rdf.store.DataLoader
Mike Personick
Managing Partner
Systap, LLC
www.systap.com
801-243-3678
skype: mike.personick
On Thu, Aug 7, 2014 at 11:41 AM, salex89 salex89@users.sf.net wrote:
Hi,
I'm using 2.6.10. Here is the complete dependencies.
Server (NanoSparqlServer with Journal as the backend)
Highly Available Server (NanoSparqlServer with HAJournal as the backend
providing a quorum based replication cluster with automatic failover,
self-healing, online backups, etc.)
Bigdata Federation (horizontal scaling with dynamic sharding).
CONFIDENTIALITY NOTICE: This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.
I know there is a lot of documentation about the bigdata database, but I
am struggling to catch up some things.
First, I've downloaded the bigdata.war file, and as I understood from the
wiki that is the easiest way to start. I've set up tomcat, deployed the war
and it is working. However, I cannot use it with the OpenRDF Sesame API. I
used this dependency
When I try to call the connection.setAutoCommit(false), I get this
exception:
java.lang.IllegalArgumentException: only auto-commit is currently supported
When I comment it out, the line :
ValueFactory f = con.getValueFactory();
Throws:
java.lang.UnsupportedOperationException at
com.bigdata.rdf.sail.remote.BigdataSailRemoteRepositoryConnection.getValueFactory(Bigdata
SailRemoteRepositoryConnection.java:1097)
However I have some more questions related to our use case. Our company deals a lot with time series data and sensor related data. In the past we have been using other RDF/Semantic databases (it would be better not to say which) and we are not too satisfied with them. Its often an issue of stability or scaling or plain lack of features.
As you may guess we have a "big data" use case, and although we are not storing everything in a triple store, there is a part of load that falls on it. So, an embedded or single server solution is not really future proof for us.
What kind of deployment would you propose? Is this database applicable in such a scenario, since we would need sometimes a larger amount of data programmatically loaded (not from files or something)? We could continue this discussion via email if you would prefer.
Best regards
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
However I have some more questions related to our use case. Our company deals a lot with time series data and sensor related data. In the past we have been using other RDF/Semantic databases (it would be better not to say which) and we are not too satisfied with them. Its often an issue of stability or scaling or plain lack of features.
As you may guess we have a "big data" use case, and although we are not storing everything in a triple store, there is a part of load that falls on it. So, an embedded or single server solution is not really future proof for us.
What kind of deployment would you propose? Is this database applicable in such a scenario, since we would need sometimes a larger amount of data programmatically loaded (not from files or something)? We could continue this discussion via email if you would prefer.
Hello,
I know there is a lot of documentation about the bigdata database, but I am struggling to catch up some things.
First, I've downloaded the bigdata.war file, and as I understood from the wiki that is the easiest way to start. I've set up tomcat, deployed the war and it is working. However, I cannot use it with the OpenRDF Sesame API. I used this dependency
When I try to call the
connection.setAutoCommit(false)
, I get this exception:When I comment it out, the line :
ValueFactory f = con.getValueFactory();
Throws:
java.lang.UnsupportedOperationException at com.bigdata.rdf.sail.remote.BigdataSailRemoteRepositoryConnection.getValueFactory(Bigdata SailRemoteRepositoryConnection.java:1097)
What am I doing wrong?
Secondly, I'm not clear with the scaling out options. It looks like the wiki is talking about different things than the ones I am looking at. I completely do not understand this chapter: http://wiki.bigdata.com/wiki/index.php/NanoSparqlServer#Scale-out_.28cluster_.2F_federation.29
Is it possible to scale out while still using OpenRDF Sesame? I see some load balancers here: http://wiki.bigdata.com/wiki/index.php/Main_Page, but I cannot put them into context.
As far as I see it, the cluster implementation needs to be downloaded in code, than compiled? http://wiki.bigdata.com/wiki/index.php/SingleMachineCluster
Excuse me if my questions sounds a bit chaotic, but I'm trying to wrap my head around the terminology and organization of the database.
P.S the code can be viewed here: http://pastebin.com/ziAzM9gV
Last edit: salex89 2014-08-06
Which version of Sesame are you using? Bigdata 1.3.1 is still on Sesame
2.6. We are close to releasing Bigdata 1.3.2, which will support Sesame
2.7.
Mike Personick
Managing Partner
Systap, LLC
www.systap.com
801-243-3678
skype: mike.personick
On Wed, Aug 6, 2014 at 5:38 PM, salex89 salex89@users.sf.net wrote:
Hi,
I'm using 2.6.10. Here is the complete dependencies.
Ok I think I see the problem. The BigdataSailRemoteRepositoryConnection is
a client-side wrapper around our RemoteRepository class. All write
operations against this class are immediately flushed to the server, hence
there is no way to turn "auto-commit" off. This class is not suitable for
bulk load operations. If you are trying to bulk load data it's always
better to do it directly against the server. Otherwise you will be posting
large files over HTTP.
There is also no concept of a client-side ValueFactory. You can just use a
simple ValueFactoryImpl (the default Sesame implementation).
You can also use an embedded bigdata instance if you'd rather just run the
database in the same JVM instead of client/server over HTTP. To get
started with an embedded instance I'd recommend checking out the class
com.bigdata.rdf.sail.BigdataSailFactory.
For high-throughput data loading you should use the data loader:
com.bigdata.rdf.store.DataLoader
Mike Personick
Managing Partner
Systap, LLC
www.systap.com
801-243-3678
skype: mike.personick
On Thu, Aug 7, 2014 at 11:41 AM, salex89 salex89@users.sf.net wrote:
Actually. you can use either the SPARQL "LOAD" operator or the INSERT by
URLs REST API method [1] for efficient bulk loading against the database
server. The only difference between these options and the DataLoader is
that the latter defaults to a larger statement buffer and hence does
incremental commits in larger batches. You can override that behavior
through BigdataSail.Options:
class
;
The corresponding default for the DataLoader is set by:
DEFAULT_BUFFER_CAPACITY}
".bufferCapacity";
Thanks,
Bryan
[1]
http://wiki.bigdata.com/wiki/index.php/NanoSparqlServer#INSERT_RDF_.28POST_with_URLs.29
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
bryan@systap.com
http://bigdata.com
http://mapgraph.io
CONFIDENTIALITY NOTICE: This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.
On Thu, Aug 7, 2014 at 6:32 AM, Mike Personick mrpersonick@users.sf.net
wrote:
Bigdata has several modes:
providing a quorum based replication cluster with automatic failover,
self-healing, online backups, etc.)
Thanks,
Bryan
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
bryan@systap.com
http://bigdata.com
http://mapgraph.io
CONFIDENTIALITY NOTICE: This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.
On Wed, Aug 6, 2014 at 10:38 AM, salex89 salex89@users.sf.net wrote:
Thank you all for this, it is working.
However I have some more questions related to our use case. Our company deals a lot with time series data and sensor related data. In the past we have been using other RDF/Semantic databases (it would be better not to say which) and we are not too satisfied with them. Its often an issue of stability or scaling or plain lack of features.
As you may guess we have a "big data" use case, and although we are not storing everything in a triple store, there is a part of load that falls on it. So, an embedded or single server solution is not really future proof for us.
What kind of deployment would you propose? Is this database applicable in such a scenario, since we would need sometimes a larger amount of data programmatically loaded (not from files or something)? We could continue this discussion via email if you would prefer.
Best regards
Let's take the discussion to an email thread. I am bryan@systap.com. I can add the others to the thread.
Thanks,
Bryan