Menu

Sesame HTTP server and location of bigdata.jn

Help
winnie
2010-03-25
2014-02-19
  • winnie

    winnie - 2010-03-25

    I am new to bigdata.  I tried out some examples and have a few questions.

    I followed the instructions on "Using Bigdata with the OpenRDF Sesame HTTP Server" on the wiki page to set up bigdata.
    ( https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=Using_Bigdata_with_the_OpenRDF_Sesame_HTTP_Server )
    I realize that  bigdata.jnl is created in the current directory rather than in ADUNA_DATA_DIR.  I am wondering if there is a way to place this in the ADUNA_DATA_DIR instead ? It seems like bigdata.jnl belongs in the ADUNA_DATA_DIR.

    I also created 2 bigdata repositories. Is it true that they share the same bigdata.jnl file ? Seems like everything for the 2 repositories are contained entirely in this one file.  Is this correct ?

    By the way, I also followed the example in the "Getting Started" wiki page to create a repository.
    The example code starts with:

    File journal = File.createTempFile("bigdata", ".jnl");
    Sail s = new BigdataSail(properties);

    and eventually gets  a Repository object.

    Just to understand a bit more, I used  repo.getDatabase() to get a TripleStore,
    and then, I used the TripleStore to get its IndexManager.
    However, I don't know what is the name of the index, so that I was not able to get the index.
    Is there a way to get an index from the IndexManager following these steps.
    There is an example in the BTreeGuide that starts with a Journal, and created a BTree.
    I still find the relationships between the index manager, the index, btree, journals and other objects a bit unclear.
    I also am not clear on when I need to manage concurrency.  It appears to me that we have to
    manage the concurrency in both the examples provided by the wiki. Do we have to manage
    concurrency when we go through the HTTP interface ? I also am under the impression for the above
    mentioned examples, I cannot delete triples. is this correct ?
    Other than the whitepaper and the javadoc, are there other places I should be looking ?

    Thanks.

     
  • Mike Personick

    Mike Personick - 2010-03-25

    Ok, wow, a lot of questions here.  I need to do a better job of explaining how to configure bigdata when using the Sesame HTTP Server, apologies for that.

    First of all, step 7 needs to be made much more clear.  In that step, you create a new bigdata repository instance and specify the location of a properties file.  What I did not illustrate is that the properties file should be configured to contain the location of the journal that will back the remote database.  Here is an example of a properties file that I am using for a remote RDF-only, transactional database.  This file is located at c:/temp/bigdata.properties, which is what I specify in step 7 when the Sesame console application asks me for a properties file.

    # bigdata.properties file

    # The name of the backing file.
    com.bigdata.journal.AbstractJournal.file=c:/temp/bigdata.jnl

    # turn on read/write transactions
    com.bigdata.rdf.sail.isolatableIndices=true

    # turn off automatic inference in the SAIL
    com.bigdata.rdf.sail.truthMaintenance=false

    # don't store justification chains, meaning retraction requires full manual
    # re-closure of the database
    com.bigdata.rdf.store.AbstractTripleStore.justify=false

    # turn off the statement identifiers feature for provenance
    com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false

    # turn off the free text index
    com.bigdata.rdf.store.AbstractTripleStore.textIndex=false

    # changing the axiom model to none essentially disables all inference
    com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms

    # also disable any vocabulary
    com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=com.bigdata.rdf.vocab.NoVocabulary

    # triples only
    com.bigdata.rdf.store.AbstractTripleStore.quads=false

    # set the initial and maximum extent of the journal
    com.bigdata.journal.AbstractJournal.initialExtent=209715200
    com.bigdata.journal.AbstractJournal.maximumExtent=209715200

    You can see that the very first property in the properties file is the location of the journal.  For each remote database you want, you should specify a different journal.  Each database will have its own set of relations and indices, backed by a separate journal file.  Once you have your remote repository running, you can test it out using some sample code I've checked in under bigdata-sails/src/samples/com/bigdata/samples/remoting/DemoSesameServer.  Here it is for your convenience:

    public class DemoSesameServer {
       
        private static final String sesameURL = "http://localhost:8080/openrdf-sesame";
       
        private static final String repoID = "bigdata";
       
        /**
         * @param args
         */
        public static void main(String args) {
           
            try {
                _main(args);
            } catch (Throwable t) {
                t.printStackTrace();
            }
           
        }
       
        public static void _main(String args) throws Exception {

            Repository repo = new HTTPRepository(sesameURL, repoID);
            repo.initialize();
           
            RepositoryConnection cxn = repo.getConnection();
            cxn.setAutoCommit(false);
           
            try { // load some statements built up programmatically
               
                URI mike = new URIImpl(BD.NAMESPACE + "Mike");
                URI bryan = new URIImpl(BD.NAMESPACE + "Bryan");
                URI loves = new URIImpl(BD.NAMESPACE + "loves");
                URI rdf = new URIImpl(BD.NAMESPACE + "RDF");
                Graph graph = new GraphImpl();
                graph.add(mike, loves, rdf);
                graph.add(bryan, loves, rdf);
               
                cxn.add(graph);
                cxn.commit();
               
            } finally {
               
                cxn.close();
               
            }
           
            { // show the entire contents of the repository

                SparqlBuilder sparql = new SparqlBuilder();
                sparql.addTriplePattern("?s", "?p", "?o");
               
                GraphQuery query = cxn.prepareGraphQuery(
                        QueryLanguage.SPARQL, sparql.toString());
                GraphQueryResult result = query.evaluate();
                while (result.hasNext()) {
                    Statement stmt = result.next();
                    System.err.println(stmt);
                }
               
            }
           
           
        }

    On to the next part of your post, what are you trying to accomplish via low-level database operations?  Why do you need the indices themselves?  Happy to help you understand things at that level, but the SAIL and even one level down, the AbstractTripleStore, are there so that you don't have to poke around inside the relations and indices.  All your basic read, write, and query operations are available via the Sesame interfaces.

    As for managing concurrency, I've just checked in read/write transactions at the SAIL level.  This is the most appropriate mode for use with the Sesame HTTP Server.  The properties file above will activate transactions for you, however transactions do not work yet with inference (it is coming).  Alternatively, you can turn off transactions and the Sesame HTTP Server will use the database in unisolated mode.  However in unisolated mode there can only be one connection at a time.  I suspect the Sesame HTTP Server holds its connections open, meaning concurrency in unisolated mode will be severely limited (i.e. only one connection).  If you need inference on a remote database and high concurrency as well, the Sesame HTTP Server is probably not the right solution.  There is also a sample custom REST service checked in that can be used as a skeleton to build your own remoting interface in that case.

    You absolutely can delete triples.  This operation is supported at every level of the database.  Again, I suggest working at the Sesame level until you are comfortable with it and/or find deficiencies in its API that cause you to poke through to a lower level.

    Have you worked through the Sesame Users Guide yet?

    http://www.openrdf.org/doc/sesame2/users/

     
  • Mike Personick

    Mike Personick - 2010-03-25

    Ok, wow, a lot of questions here.  I need to do a better job of explaining how to configure bigdata when using the Sesame HTTP Server, apologies for that.

    First of all, step 7 needs to be made much more clear.  In that step, you create a new bigdata repository instance and specify the location of a properties file.  What I did not illustrate is that the properties file should be configured to contain the location of the journal that will back the remote database.  Here is an example of a properties file that I am using for a remote RDF-only, transactional database.  This file is located at c:/temp/bigdata.properties, which is what I specify in step 7 when the Sesame console application asks me for a properties file.

    # bigdata.properties file
    # The name of the backing file.
    com.bigdata.journal.AbstractJournal.file=c:/temp/bigdata.jnl
    # turn on read/write transactions
    com.bigdata.rdf.sail.isolatableIndices=true
    # turn off automatic inference in the SAIL
    com.bigdata.rdf.sail.truthMaintenance=false
    # don't store justification chains, meaning retraction requires full manual 
    # re-closure of the database
    com.bigdata.rdf.store.AbstractTripleStore.justify=false
    # turn off the statement identifiers feature for provenance
    com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
    # turn off the free text index
    com.bigdata.rdf.store.AbstractTripleStore.textIndex=false
    # changing the axiom model to none essentially disables all inference
    com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
    # also disable any vocabulary
    com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=com.bigdata.rdf.vocab.NoVocabulary
    # triples only
    com.bigdata.rdf.store.AbstractTripleStore.quads=false
    # set the initial and maximum extent of the journal
    com.bigdata.journal.AbstractJournal.initialExtent=209715200
    com.bigdata.journal.AbstractJournal.maximumExtent=209715200
    

    You can see that the very first property in the properties file is the location of the journal.  For each remote database you want, you should specify a different journal.  Each database will have its own set of relations and indices, backed by a separate journal file.  Once you have your remote repository running, you can test it out using some sample code I've checked in under bigdata-sails/src/samples/com/bigdata/samples/remoting/DemoSesameServer.java.  Here it is for your convenience:

    public class DemoSesameServer {
        
        private static final String sesameURL = "http://localhost:8080/openrdf-sesame";
        
        private static final String repoID = "bigdata";
        
        /**
         * @param args
         */
        public static void main(String[] args) {
            
            try {
                _main(args);
            } catch (Throwable t) {
                t.printStackTrace();
            }
            
        }
        
        public static void _main(String[] args) throws Exception {
            Repository repo = new HTTPRepository(sesameURL, repoID);
            repo.initialize();
            
            RepositoryConnection cxn = repo.getConnection();
            cxn.setAutoCommit(false);
            
            try { // load some statements built up programmatically
                
                URI mike = new URIImpl(BD.NAMESPACE + "Mike");
                URI bryan = new URIImpl(BD.NAMESPACE + "Bryan");
                URI loves = new URIImpl(BD.NAMESPACE + "loves");
                URI rdf = new URIImpl(BD.NAMESPACE + "RDF");
                Graph graph = new GraphImpl();
                graph.add(mike, loves, rdf);
                graph.add(bryan, loves, rdf);
                
                cxn.add(graph);
                cxn.commit();
                
            { // show the entire contents of the repository
                SparqlBuilder sparql = new SparqlBuilder();
                sparql.addTriplePattern("?s", "?p", "?o");
                
                GraphQuery query = cxn.prepareGraphQuery(
                        QueryLanguage.SPARQL, sparql.toString());
                GraphQueryResult result = query.evaluate();
                while (result.hasNext()) {
                    Statement stmt = result.next();
                    System.err.println(stmt);
                }
                
            }
            
            
            } finally {
                
                cxn.close();
                
            }
            
        }
    

    On to the next part of your post, what are you trying to accomplish via low-level database operations?  Why do you need the indices themselves?  Happy to help you understand things at that level, but the SAIL and even one level down, the AbstractTripleStore, are there so that you don't have to poke around inside the relations and indices.  All your basic read, write, and query operations are available via the Sesame interfaces.

    As for managing concurrency, I've just checked in read/write transactions at the SAIL level.  This is the most appropriate mode for use with the Sesame HTTP Server.  The properties file above will activate transactions for you, however transactions do not work yet with inference (it is coming).  Alternatively, you can turn off transactions and the Sesame HTTP Server will use the database in unisolated mode.  However in unisolated mode there can only be one connection at a time.  I suspect the Sesame HTTP Server holds its connections open, meaning concurrency in unisolated mode will be severely limited (i.e. only one connection).  If you need inference on a remote database and high concurrency as well, the Sesame HTTP Server is probably not the right solution.  There is also a sample custom REST service checked in that can be used as a skeleton to build your own remoting interface in that case.

    You absolutely can delete triples.  This operation is supported at every level of the database.  Again, I suggest working at the Sesame level until you are comfortable with it and/or find deficiencies in its API that cause you to poke through to a lower level. 

    Have you worked through the Sesame Users Guide yet?

    http://www.openrdf.org/doc/sesame2/users/

     
  • Mike Personick

    Mike Personick - 2010-03-25

    Ok, wow, a lot of questions here.  I need to do a better job of explaining how to configure bigdata when using the Sesame HTTP Server, apologies for that.

    First of all, step 7 needs to be made much more clear.  In that step, you create a new bigdata repository instance and specify the location of a properties file.  What I did not illustrate is that the properties file should be configured to contain the location of the journal that will back the remote database.  Here is an example of a properties file that I am using for a remote RDF-only, transactional database.  This file is located at c:/temp/bigdata.properties, which is what I specify in step 7 when the Sesame console application asks me for a properties file.

    # bigdata.properties file

    # The name of the backing file.
    com.bigdata.journal.AbstractJournal.file=c:/temp/bigdata.jnl

    # turn on read/write transactions
    com.bigdata.rdf.sail.isolatableIndices=true

    # turn off automatic inference in the SAIL
    com.bigdata.rdf.sail.truthMaintenance=false

    # don't store justification chains, meaning retraction requires full manual
    # re-closure of the database
    com.bigdata.rdf.store.AbstractTripleStore.justify=false

    # turn off the statement identifiers feature for provenance
    com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false

    # turn off the free text index
    com.bigdata.rdf.store.AbstractTripleStore.textIndex=false

    # changing the axiom model to none essentially disables all inference
    com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms

    # also disable any vocabulary
    com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=com.bigdata.rdf.vocab.NoVocabulary

    # triples only
    com.bigdata.rdf.store.AbstractTripleStore.quads=false

    # set the initial and maximum extent of the journal
    com.bigdata.journal.AbstractJournal.initialExtent=209715200
    com.bigdata.journal.AbstractJournal.maximumExtent=209715200

    You can see that the very first property in the properties file is the location of the journal.  For each remote database you want, you should specify a different journal.  Each database will have its own set of relations and indices, backed by a separate journal file.  Once you have your remote repository running, you can test it out using some sample code I've checked in under bigdata-sails/src/samples/com/bigdata/samples/remoting/DemoSesameServer.java.  Here it is for your convenience:

    public class DemoSesameServer {
        
        private static final String sesameURL = "http://localhost:8080/openrdf-sesame";
        
        private static final String repoID = "bigdata";
        
        /**
         * @param args
         */
        public static void main(String[] args) {
            
            try {
                _main(args);
            } catch (Throwable t) {
                t.printStackTrace();
            }
            
        }
        
        public static void _main(String[] args) throws Exception {
            Repository repo = new HTTPRepository(sesameURL, repoID);
            repo.initialize();
            
            RepositoryConnection cxn = repo.getConnection();
            cxn.setAutoCommit(false);
            
            try { // load some statements built up programmatically
                
                URI mike = new URIImpl(BD.NAMESPACE + "Mike");
                URI bryan = new URIImpl(BD.NAMESPACE + "Bryan");
                URI loves = new URIImpl(BD.NAMESPACE + "loves");
                URI rdf = new URIImpl(BD.NAMESPACE + "RDF");
                Graph graph = new GraphImpl();
                graph.add(mike, loves, rdf);
                graph.add(bryan, loves, rdf);
                
                cxn.add(graph);
                cxn.commit();
                
            { // show the entire contents of the repository
                SparqlBuilder sparql = new SparqlBuilder();
                sparql.addTriplePattern("?s", "?p", "?o");
                
                GraphQuery query = cxn.prepareGraphQuery(
                        QueryLanguage.SPARQL, sparql.toString());
                GraphQueryResult result = query.evaluate();
                while (result.hasNext()) {
                    Statement stmt = result.next();
                    System.err.println(stmt);
                }
                
            }
            
            
            } finally {
                
                cxn.close();
                
            }
            
        }
    

    On to the next part of your post, what are you trying to accomplish via low-level database operations?  Why do you need the indices themselves?  Happy to help you understand things at that level, but the SAIL and even one level down, the AbstractTripleStore, are there so that you don't have to poke around inside the relations and indices.  All your basic read, write, and query operations are available via the Sesame interfaces.

    As for managing concurrency, I've just checked in read/write transactions at the SAIL level.  This is the most appropriate mode for use with the Sesame HTTP Server.  The properties file above will activate transactions for you, however transactions do not work yet with inference (it is coming).  Alternatively, you can turn off transactions and the Sesame HTTP Server will use the database in unisolated mode.  However in unisolated mode there can only be one connection at a time.  I suspect the Sesame HTTP Server holds its connections open, meaning concurrency in unisolated mode will be severely limited (i.e. only one connection).  If you need inference on a remote database and high concurrency as well, the Sesame HTTP Server is probably not the right solution.  There is also a sample custom REST service checked in that can be used as a skeleton to build your own remoting interface in that case.

    You absolutely can delete triples.  This operation is supported at every level of the database.  Again, I suggest working at the Sesame level until you are comfortable with it and/or find deficiencies in its API that cause you to poke through to a lower level. 

    Have you worked through the Sesame Users Guide yet?

    [urlhttp://www.openrdf.org/doc/sesame2/users/

     
  • Mike Personick

    Mike Personick - 2010-03-25

    Ok, wow, a lot of questions here.  I need to do a better job of explaining how to configure bigdata when using the Sesame HTTP Server, apologies for that.

    First of all, step 7 needs to be made much more clear.  In that step, you create a new bigdata repository instance and specify the location of a properties file.  What I did not illustrate is that the properties file should be configured to contain the location of the journal that will back the remote database.  Here is an example of a properties file that I am using for a remote RDF-only, transactional database.  This file is located at c:/temp/bigdata.properties, which is what I specify in step 7 when the Sesame console application asks me for a properties file.

    # bigdata.properties file

    # The name of the backing file.
    com.bigdata.journal.AbstractJournal.file=c:/temp/bigdata.jnl

    # turn on read/write transactions
    com.bigdata.rdf.sail.isolatableIndices=true

    # turn off automatic inference in the SAIL
    com.bigdata.rdf.sail.truthMaintenance=false

    # don't store justification chains, meaning retraction requires full manual
    # re-closure of the database
    com.bigdata.rdf.store.AbstractTripleStore.justify=false

    # turn off the statement identifiers feature for provenance
    com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false

    # turn off the free text index
    com.bigdata.rdf.store.AbstractTripleStore.textIndex=false

    # changing the axiom model to none essentially disables all inference
    com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms

    # also disable any vocabulary
    com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=com.bigdata.rdf.vocab.NoVocabulary

    # triples only
    com.bigdata.rdf.store.AbstractTripleStore.quads=false

    # set the initial and maximum extent of the journal
    com.bigdata.journal.AbstractJournal.initialExtent=209715200
    com.bigdata.journal.AbstractJournal.maximumExtent=209715200

    You can see that the very first property in the properties file is the location of the journal.  For each remote database you want, you should specify a different journal.  Each database will have its own set of relations and indices, backed by a separate journal file.  Once you have your remote repository running, you can test it out using some sample code I've checked in under bigdata-sails/src/samples/com/bigdata/samples/remoting/DemoSesameServer.java.  Here it is for your convenience:

    public class DemoSesameServer {
       
        private static final String sesameURL = "http://localhost:8080/openrdf-sesame";
       
        private static final String repoID = "bigdata";
       
        /**
         * @param args
         */
        public static void main(String args) {
           
            try {
                _main(args);
            } catch (Throwable t) {
                t.printStackTrace();
            }
           
        }
       
        public static void _main(String args) throws Exception {

            Repository repo = new HTTPRepository(sesameURL, repoID);
            repo.initialize();
           
            RepositoryConnection cxn = repo.getConnection();
            cxn.setAutoCommit(false);
           
            try { // load some statements built up programmatically
               
                URI mike = new URIImpl(BD.NAMESPACE + "Mike");
                URI bryan = new URIImpl(BD.NAMESPACE + "Bryan");
                URI loves = new URIImpl(BD.NAMESPACE + "loves");
                URI rdf = new URIImpl(BD.NAMESPACE + "RDF");
                Graph graph = new GraphImpl();
                graph.add(mike, loves, rdf);
                graph.add(bryan, loves, rdf);
               
                cxn.add(graph);
                cxn.commit();
               
            { // show the entire contents of the repository

                SparqlBuilder sparql = new SparqlBuilder();
                sparql.addTriplePattern("?s", "?p", "?o");
               
                GraphQuery query = cxn.prepareGraphQuery(
                        QueryLanguage.SPARQL, sparql.toString());
                GraphQueryResult result = query.evaluate();
                while (result.hasNext()) {
                    Statement stmt = result.next();
                    System.err.println(stmt);
                }
               
            }
           
           
            } finally {
               
                cxn.close();
               
            }
           
        }

    On to the next part of your post, what are you trying to accomplish via low-level database operations?  Why do you need the indices themselves?  Happy to help you understand things at that level, but the SAIL and even one level down, the AbstractTripleStore, are there so that you don't have to poke around inside the relations and indices.  All your basic read, write, and query operations are available via the Sesame interfaces.

    As for managing concurrency, I've just checked in read/write transactions at the SAIL level.  This is the most appropriate mode for use with the Sesame HTTP Server.  The properties file above will activate transactions for you, however transactions do not work yet with inference (it is coming).  Alternatively, you can turn off transactions and the Sesame HTTP Server will use the database in unisolated mode.  However in unisolated mode there can only be one connection at a time.  I suspect the Sesame HTTP Server holds its connections open, meaning concurrency in unisolated mode will be severely limited (i.e. only one connection).  If you need inference on a remote database and high concurrency as well, the Sesame HTTP Server is probably not the right solution.  There is also a sample custom REST service checked in that can be used as a skeleton to build your own remoting interface in that case.

    You absolutely can delete triples.  This operation is supported at every level of the database.  Again, I suggest working at the Sesame level until you are comfortable with it and/or find deficiencies in its API that cause you to poke through to a lower level. 

    Have you worked through the Sesame Users Guide yet?

    [urlhttp://www.openrdf.org/doc/sesame2/users/

     
  • winnie

    winnie - 2010-03-25

    Thank you for your response and clear explanation.

    I have no need to use the low level index. I was just curious, since the BTreeGuide showed us a way of getting it.

    Yes, I have looked at the sesame user guide. The examples provided all worked, just that I am on the outside using bigdata, so it always feel more comfortable when I can understand a bit more about what I am doing. I think that the Sesame interface is sufficient for me (for now).

    Thanks again,

     

Log in to post a comment.

Auth0 Logo