Problem setting up cluster

Help
Anonymous
2012-05-11
2014-02-19
  • Anonymous - 2012-05-11

    Hi,

    I'm currently trying to setup a federated/cluster bigdata instance (after successfully running a standalone instance) and am running into issues.

    I checked out the 1_2_0 tag from SVN and run ant install which all goes fine, but then when I set the status to start it just doesn't work.

    I then tried starting the bigdata service manually on each node and after a little while got the following:

    FATAL: 180620    main com.bigdata.service.jini.AbstractServer.fatal(AbstractServer.java:419): Could not start service: com.bigdata.jini.start.ServicesManagerServer{serviceName=com.bigdata.jini.start.ServicesManagerServer@prism-lab-04.kc.caplib#978777649, hostname=prism-lab-04.kc.caplib, serviceUUID=553cd636-673f-4d7f-836e-e623d04939ab}
    java.lang.RuntimeException: java.lang.Exception: Zookeeper not connected: startup sequence aborted.
            at com.bigdata.jini.start.AbstractServicesManagerService.start(AbstractServicesManagerService.java:244)
            at com.bigdata.jini.start.ServicesManagerServer$AdministrableServicesManagerService.start(ServicesManagerServer.java:564)
            at com.bigdata.jini.start.ServicesManagerServer$AdministrableServicesManagerService.start(ServicesManagerServer.java:404)
            at com.bigdata.service.jini.AbstractServer.<init>(AbstractServer.java:841)
            at com.bigdata.jini.start.ServicesManagerServer.<init>(ServicesManagerServer.java:354)
            at com.bigdata.jini.start.ServicesManagerServer.main(ServicesManagerServer.java:372)
    Caused by: java.lang.Exception: Zookeeper not connected: startup sequence aborted.
            at com.bigdata.jini.start.ServicesManagerStartupTask.doStartup(ServicesManagerStartupTask.java:178)
            at com.bigdata.jini.start.ServicesManagerStartupTask.call(ServicesManagerStartupTask.java:105)
            at com.bigdata.jini.start.AbstractServicesManagerService.setup(AbstractServicesManagerService.java:306)
            at com.bigdata.jini.start.AbstractServicesManagerService.start(AbstractServicesManagerService.java:240)
            ... 5 more
    

    Any tips?

     
  • Anonymous - 2012-05-11

    Seems to be an issue with that tag, have tried instead with Trunk and it appears happier, I get the following when listing the services:

    Zookeeper is running.
    Discovered 1 jini service registrars.
    Discovered 10 services
    Discovered 0 stale bigdata services.
    Discovered 9 live bigdata services.
    Discovered 1 other services.
    Bigdata services by serviceIface:
      There are 3 instances of com.bigdata.jini.start.IServicesManagerService on 3 hosts
      There are 1 instances of com.bigdata.journal.ITransactionService on 1 hosts
      There are 1 instances of com.bigdata.service.IClientService on 1 hosts
      There are 2 instances of com.bigdata.service.IDataService on 2 hosts
      There are 1 instances of com.bigdata.service.ILoadBalancerService on 1 hosts
      There are 1 instances of com.bigdata.service.IMetadataService on 1 hosts
    Bigdata services by hostname:
      There are 2 live bigdata services on xx.xx.xxx.xx
        There are 1 com.bigdata.jini.start.IServicesManagerService services
        There are 1 com.bigdata.service.IDataService services
      There are 2 live bigdata services on xx.xx.xxx.xx
        There are 1 com.bigdata.jini.start.IServicesManagerService services
        There are 1 com.bigdata.service.IDataService services
      There are 5 live bigdata services on xx.xx.xx.xx
        There are 1 com.bigdata.jini.start.IServicesManagerService services
        There are 1 com.bigdata.journal.ITransactionService services
        There are 1 com.bigdata.service.IClientService services
        There are 1 com.bigdata.service.ILoadBalancerService services
        There are 1 com.bigdata.service.IMetadataService services
    

    but when I try starting
    bin/nanoSparqlServer.sh 8000 benchmark

    then try to connect it simply echo's the request path, e.g.

    /
    
     
  • Bryan Thompson

    Bryan Thompson - 2012-05-11

    The trunk is roughly the .84 release.  You should try the 1.1.x tags if you want to try a different release.

    From your first trace, it appears that the root problem is that zookeeper is either not started or is running on a different machine and/or ports from those which are configured for the bigdata services.  I.e., this is likely to be a configuration issue.  Try running the "bigdata start" script by hand on each node in the cluster and see what it gives you back on the console. Also, make sure to look at the status and error logs.  There are some wiki pages which will help you to debug your cluster setup.  This is all links from .

    Thanks,
    Bryan

    https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=Main_Page

     
  • Anonymous - 2012-05-11

    Thanks Bryan,

    I just realised trunk is an older release when I managed to post to the SPARQL server and it responded back that POST wasn't supported.

    I'm having some problems with the wiki at the moment, I think it's a configuration issue on Sourceforge's end (as it was down for several days this week).

    I've now checked out and built the 1_2_0 tag and am getting the following when starting bigdata on the node which is loadbalanced/zookeeper etc:

    org.apache.zookeeper.server.quorum.QuorumPeerMain : java version "1.6.0_32"
    org.apache.zookeeper.server.quorum.QuorumPeerMain : Java(TM) SE Runtime Environment (build 1.6.0_32-b05)
    org.apache.zookeeper.server.quorum.QuorumPeerMain : Java HotSpot(TM) 64-Bit Server VM (build 20.7-b02, mixed mode)
    org.apache.zookeeper.server.quorum.QuorumPeerMain :
    org.apache.zookeeper.server.quorum.QuorumPeerMain : Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/zookeeper/server/quorum/QuorumPeerMain
    org.apache.zookeeper.server.quorum.QuorumPeerMain : Caused by: java.lang.ClassNotFoundException: org.apache.zookeeper.server.quorum.QuorumPeerMain
    org.apache.zookeeper.server.quorum.QuorumPeerMain :     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    org.apache.zookeeper.server.quorum.QuorumPeerMain :     at java.security.AccessController.doPrivileged(Native Method)
    org.apache.zookeeper.server.quorum.QuorumPeerMain :     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    org.apache.zookeeper.server.quorum.QuorumPeerMain :     at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    org.apache.zookeeper.server.quorum.QuorumPeerMain :     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    org.apache.zookeeper.server.quorum.QuorumPeerMain :     at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    org.apache.zookeeper.server.quorum.QuorumPeerMain : Could not find the main class: org.apache.zookeeper.server.quorum.QuorumPeerMain.  Program will exit.
    

    which is somewhat strange.

    Phil.

     
  • Anonymous - 2012-05-11

    Right, I've got a bit futher - I've substituted the standalong config instead of the cluster config and everything starts as it should on a single machine.

    My problem now is that when running the nanoSparqlServer, it starts fine, provides an interface where I can query, but none of my inserts work. They don't return an error, but querying for them just returns nothing, and a

    SELECT (count(?s) as ?count) WHERE { ?s ?p ?o }

    reveals the same number constantly - 187

    Phil.

     
  • Bryan Thompson

    Bryan Thompson - 2012-05-11

    The 187 is the RDFS+ axioms loaded into the KB when it is first created. 

    SPARQL update is a new feature.  It goes through the SAIL and is less extensively tested on a cluster (read, it has not been tuned for performance at all on a cluster).

    You should make sure that you are not using truth maintenance on the cluster. The NSS wiki page has some guidance on how to setup the cluster for the nano sparql server.  If you are not careful, it can create an inappropriate configuration.

    Thanks,
    Bryan

     
  • Anonymous - 2012-05-11

    Thanks Bryan,

    I've tracked down the classpath issue with Zookeeper when running the bigdataCluster.properties, it's the following lines which were not commented out as they are in bigdataStandalone.properties:

        // This is all you need to run zookeeper.
        //classpath = new String[] {
        //  "/nas/bigdata/benchmark/lib/apache/zookeeper-3.2.1.jar",
        //    "/nas/bigdata/benchmark/lib/apache/log4j-1.2.15.jar"
        //};
    

    the version number on the zookeeper jar should be 3.3.3

    Phil.

     
  • Bryan Thompson

    Bryan Thompson - 2012-05-11

    Great.  I see that the bigdataCluster16.config file had those lines commented out, but not the bigdataCluster.config file.  I would recommend that you just comment out those lines.  That block just provides an alternative classpath declaration for starting zookeeper with the minimum dependencies.  If you comment it out, then it just uses the same classpath which is used to start all of the bigdata services.

    I'll modify the bigdataCluster.config file to comment out those lines in SVN.

    Thanks,
    Bryan

     
  • Anonymous - 2012-05-11

    Thanks for all your help so far Bryan.

    I've got a lot further and now have a 5 node cluster (1 node zookeper, load balancer, txn server etc., 4 data service nodes) up and running.

    I've made the changes for the nano sparql server as suggested on the wiki, but when I start it every page I request returns a 404.

    There doesn't seem to be anything in the logs so I'm at a loss.

     
  • Bryan Thompson

    Bryan Thompson - 2012-05-13

    What URL are you using?

     
  • Anonymous - 2012-05-13

    I've tried lots. From memory (away from cluster at the monent)

    /bigdata
    /sparql
    /status
    /counters
    /namespace
    /namespace/sparql

    As an aside - is it possible to configure the nano sparql server in the war file you make available for scale out, or would I be better using sesame!

    Phil.

     
  • Anonymous - 2012-05-14

    Hi Bryan,

    Sorry for the confusion, the question about the WAR file was just an aside.

    I've followed all of the instructions in the Wiki about starting a NanoSparqlServer for a scale-out instance of BigData and am getting a 404 for any page requested (so the embedded Jetty server used is firing up, but NanoSparqlServer isn't it seems).

    Regards,

    Phil.

     
  • Anonymous - 2012-05-14

    Another update,

    I've just tried the bigdataStandalone.config to see if that makes it any different, and I'm still getting a 404 from all requests to the nanoSparqlServer

    Regards,

    Phil.

     
  • Bryan Thompson

    Bryan Thompson - 2012-05-14

    Phil,

    When I do:

    nanoSparqlServer.sh 8090 U1000
    

    It reports:

    serviceURL: http://192.168.1.10:8090
    

    When I point a web browser to that service URL, I see the NSS web interface.  For the service end point which was reported to me by the NSS, the SPARQL end point will be at

    serviceURL: http://192.168.1.10:8090/sparql
    

    Thanks,
    Bryan

     
  • Anonymous - 2012-05-17

    Hi Bryan,

    Quite curious - when I enable QUADS support by adding:

          new NV(BigdataSail.Options.QUADS, "true" ),

    to lubm.properties, I only get 404s from the web interface. Without that setting, the normal web interface shows as expected.

    Regards,

    Phil.

     
  • Anonymous - 2012-05-17

    Hi,

    Sorry, just realised I had a typo and it should be QUADS_MODE and not QUADS.

    Do errors stemmin from incorrect params get logged anywhere out of interest?

    Regards,

    Phil.

     

Log in to post a comment.