Menu

Exception during restore on Linux using dump-restore

Help
2014-04-16
2014-11-10
1 2 3 4 > >> (Page 1 of 4)
  • Justin Cranford

    Justin Cranford - 2014-04-16

    I am getting an exception from line 220 in MySQLDialect.java when it calls Processes.run():

        Processes.run(setPassword(new ProcessBuilder("mysql", "-h", properties.getHost(), "-P", properties.getPort(), "-u", properties.getUser(), properties.getDatabase(), "<", file.getPath()), properties));
    

    The exception is below. I see the exception is in a method next to another you fixed in 3.0.1. For the "mysqldump" method, you got rid of the > to fix mysqldump syntax on Windows. In the "mysql" method, it is still using the < pipe. I hit the exception below on Linux this time, but I wonder if the problem is similar to the Windows syntax error.

    For now, I am manually adding debug print statements to MySQLDialect.java to show the generated mysqldump and mysql commands, plus the return code in Processes.java. Then I can try the commands on the command line, outside of HA-JDBC, to more easily diagnose if this is another syntax error, or if it is related to my environment and database configuration. Debugs here would be a nice to have for other people running into other problems, such as command path problems, mysql.user authorization problems, disk space issues, and more.

    Apr 16, 2014 8:54:56 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
    WARNING: Failed to activate database mycluster from cluster mydb1
    java.sql.SQLException:
            at net.sf.hajdbc.sql.SQLExceptionFactory.createException(SQLExceptionFactory.java:51)
            at net.sf.hajdbc.sql.SQLExceptionFactory.createException(SQLExceptionFactory.java:35)
            at net.sf.hajdbc.AbstractExceptionFactory.createException(AbstractExceptionFactory.java:62)
            at net.sf.hajdbc.sync.DumpRestoreSynchronizationStrategy.synchronize(DumpRestoreSynchronizationStrategy.java:98)
            at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:885)
            at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:156)
            at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:134)
    Caused by: java.lang.Exception:
            at net.sf.hajdbc.util.Processes.run(Processes.java:39)
            at net.sf.hajdbc.dialect.mysql.MySQLDialect.restore(MySQLDialect.java:220)
            at net.sf.hajdbc.sync.DumpRestoreSynchronizationStrategy.synchronize(DumpRestoreSynchronizationStrategy.java:89)
            ... 5 more
    
    Apr 16, 2014 8:54:56 PM com.myapp.activate run
    WARNING: UpdateDBClusterMemberCmd.activateClusterMember Exception while activating member 'mydb1'.
    java.lang.IllegalStateException: java.sql.SQLException:
            at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:174)
            at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:134)
    
     

    Last edit: Justin Cranford 2014-04-17
  • Justin Cranford

    Justin Cranford - 2014-04-18

    I added some System.out.println() messages, but I am getting a different exception. I get this error after rebooting a server with a Tomcat/MySQL node, logging into that Tomcat after restart, and trying to activate the co-located database:

    WARNING: UpdateDBClusterMemberCmd.activateClusterMember Exception while activating member 'mydb2'.
    java.lang.NullPointerException
            at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockCoordinator(DistributedLockManager.java:412)
            at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockInterruptibly(DistributedLockManager.java:316)
            at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:864)
            at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:156)
            at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:134)
    
     
  • Justin Cranford

    Justin Cranford - 2014-04-19

    OK, I isolated these exceptions. They are separate issues.


    Issue 1) NullPointerException in DistributedLockManager happens if activate attempted after HA-JDBC rejoins jGroups cluster

    To reproduce, I have 3 Linux servers. I start MySQL on each node, then Tomcats on each node one at a time.

    To test activate, I stop MySQL on node 3 until cluster state updated on all Tomcat nodes. Then on Tomcat 1, I initiate activate using "full" strategy and it works. Note I did not use "dump-restore". All good up to now.

    Now I power cycle node 1. After restart, Tomcat 1 rejoins the jGroups cluster, and it shows only MySQL 2 and 3 are active. So I initiate another activate using "full" strategy again, and it failed due to that NullPointerException in DistributedLockManager.

    Conclusion: This test case worked with HA-JDBC 2.0.15. It is not working for me in 3.0. Is this supported? If so, what would cause HA-JDBC to initialize or run improperly after rejoining the jGroups cluster.


    Issue 2) Executing the same test as 1), but using "dump-restore" synchronization strategy, fails during restore phase. You previously fixed the mysqldump command, but the mysql command for restore fails for unknown reason.

     
  • Justin Cranford

    Justin Cranford - 2014-04-19

    Here are some log messages from HA-JDBC when it rejoined the cluster. So, Tomcat 1 did rejoin the jGroups cluster. However, something is not right after rejoining because attemping to activate mydb1 fails.

    Apr 19, 2014 1:07:18 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
    INFO: State for database cluster mycluster will be persisted to jdbc:h2:/var/tmp/dbclusterstate/mycluster
    -------------------------------------------------------------------
    GMS: address=DefaultHostname-3291, cluster=mycluster.lock, physical address=10.0.0.92:7900
    -------------------------------------------------------------------
    -------------------------------------------------------------------
    GMS: address=DefaultHostname-23075, cluster=mycluster.state, physical address=10.0.0.92:7900
    -------------------------------------------------------------------
    Apr 19, 2014 1:07:23 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
    INFO: Using remote initial cluster state [mydb2, mydb3] from DefaultHostname-58151
    
     
  • Paul Ferraro

    Paul Ferraro - 2014-04-19

    The 1st issue is definitely a bug. I'll release 3.0.3 as soon as I figure out a fix.
    The 2nd issue is a little tricky. While mysqldump has a command line option to specify the output file, the mysql command line has no such option for an input file. Any suggestions?

     
  • Justin Cranford

    Justin Cranford - 2014-04-19

    I will try dump-restore again with some debug messages. I will try to find out the command line and return code, and try it on command line outside of HA-JDBC.

     
  • Justin Cranford

    Justin Cranford - 2014-04-19

    Here is the output of debug messages I added to HA-JDBC.

    mysqldump -h 10.0.0.24 -P 3306 -u mysqluser -r /tomcat/temp/ha-jdbc_4802833547109657513.dump mydb
    Status = 0
    
    mysql -h 10.0.0.26 -P 3306 -u mysqluser mydb < /tomcat/temp/ha-jdbc_4802833547109657513.dump
    Status = 1
    

    I ran the second command externally on the command line, and it looks it needs a "-p" parameter before the schema name to know to look for the password on STDIN. If not, it assumes no password required, ignored the password you provide on STDIN, and authentication always fails.

    root@Defaulthostname:/tmp# mysql -h 10.0.0.26 -P 3306 -u mysqluser mydb
    ERROR 1045 (28000): Access denied for user 'mysqluser'@'10.0.0.24' (using password: NO)
    
    root@Defaulthostname:/tmp# mysql -h 10.0.0.26 -P 3306 -u mysqluser -p mydb
    Enter password:
    Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A
    
    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 240
    Server version: 5.1.58-1~dotdeb.0-log (Debian)
    
    Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
    This software comes with ABSOLUTELY NO WARRANTY. This is free software,
    and you are welcome to modify and redistribute it under the GPL v2 license
    
    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    
    mysql>
    
     
  • Justin Cranford

    Justin Cranford - 2014-04-19

    Hang on. The mysqldump command complains here too. If I run mysqldump externally on the command line, it wants a "-p" parameter before the schema name too. The return code there is 2, but it got return code 0 when run by HA-JDBC.

    You don't have to provide the password on the command line after the "-p" parameter, but I think both commands will need it.

     
  • Justin Cranford

    Justin Cranford - 2014-04-19

    If you have to touch the parameters for mysqldump and mysql to fix password input, what do you think about optionally adding the -C parameter at the same time (or --compress)?

    HA-JDBC executes mysqldump/mysql commands locally, no matter if the source and/or destination databases are local versus remote. Adding the -C parameter transparently compressed MySQL TCP traffic, without having to worry about the dump file format.

    Imagine a very large database, or a very slow network. MySQL TCP communication compression would really improve dump-restore performance in both cases. For my particular database, I tested mysqldump output with gzip just to see the compression potential, and it was 23.4:1.

    If you like the idea, I would recommend it only for remote MySQL dump or restore calls. Adding it for a local MySQL call would only slow it down.

    If it helps, I use this utility method to know if an address is local. Or you can always fall back on the local attribute configured in HA-JDBC.

    import java.net.*;
    
    /**
     * Check if hostName or ipAddress are for the local site.
     * @param hostName or ipAddress
     * @return boolean
     */
    public static boolean isLocal(String hostName) {
        try {
            InetAddress address = InetAddress.getByName(hostName);
            // Check if the address is a valid special local or loopback
            if (address.isAnyLocalAddress() || address.isLoopbackAddress())
                return true;
    
            // Check if the address is defined on any interface
            try {
                return (NetworkInterface.getByInetAddress(address) != null);
            } catch (SocketException e) {
                return false;
            }
        } catch (Exception e) {
            return false;
        }
    }
    
     
  • Paul Ferraro

    Paul Ferraro - 2014-04-23

    Adding compression is a good idea, however, I don't see how to make the compressed output available as input to the mysql command. I still don't know how to provide the mysql command with a file. (Other than piping via "<")

    As for providing the password to the database, the "-p" option should not be needed. The password is supplied to the mysqldump and mysql commands via the MYSQL_PWD environment property of the process. See:
    https://github.com/ha-jdbc/ha-jdbc/blob/3.0/src/main/java/net/sf/hajdbc/dialect/mysql/MySQLDialect.java#L223

     

    Last edit: Paul Ferraro 2014-04-24
    • Paul Ferraro

      Paul Ferraro - 2014-04-26

      Hmm. Reading the javadoc for ProcessBuilder, it appears that the environment() method may not work correctly on all systems. Maybe the environment can't be updated at runtime on your OS?

       
      • Justin Cranford

        Justin Cranford - 2014-04-26

        That crossed my mind too, but "mysqldump" worked while "mysql" did not. I would think both should succeed or both fail, instead of a mixed result.

        I think the restriction might be on changing environment variables for the current process. Java should be ok to set whatever for a child process. Then again "should" is not a guarantee. I will keep it in mind as I keep playing investigating too.

         
  • Justin Cranford

    Justin Cranford - 2014-04-25

    I tried the new code. I get a new error now. Activate gets stuck in an infinite loop trying to contact one of the other members, and it always times out.

    I used 3 servers each with Tomcat and MySQL. With the cluster up, I stopped and started MySQL on nodes and successfully reactivated using "full" strategy. This was repeatable a few times for different MySQL nodes.

    Next I rebooted server 1, and waited for Tomcat to rejoin the cluster. It started with only MySQL 2 and 3 active. Then I attempted a "full" reactivate of MySQL 1 from node 1. It fails with timeouts trying to contact Tomcat 2, and keeps looping every 6 seconds which is my jGroups timeout in HA-JDBC programmatic config.

    Apr 25, 2014 2:40:56 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
    WARNING: timeout sending message to Defaulthostname-7295
    org.jgroups.TimeoutException: timeout sending message to Defaulthostname-7295
            at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:419)
            at net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher.execute(JGroupsCommandDispatcher.java:177)
            at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockCoordinator(DistributedLockManager.java:422)
            at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockInterruptibly(DistributedLockManager.java:322)
            at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:864)
            at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:156)
    
    Apr 25, 2014 2:41:02 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
    WARNING: timeout sending message to Defaulthostname-7295
    org.jgroups.TimeoutException: timeout sending message to Defaulthostname-7295
            at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:419)
            at net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher.execute(JGroupsCommandDispatcher.java:177)
            at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockCoordinator(DistributedLockManager.java:422)
            at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockInterruptibly(DistributedLockManager.java:322)
            at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:864)
            at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:156)
    
    Apr 25, 2014 2:41:08 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
    WARNING: timeout sending message to Defaulthostname-7295
    org.jgroups.TimeoutException: timeout sending message to Defaulthostname-7295
            at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:419)
            at net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher.execute(JGroupsCommandDispatcher.java:177)
            at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockCoordinator(DistributedLockManager.java:422)
            at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockInterruptibly(DistributedLockManager.java:322)
            at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:864)
            at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:156)
    
     
    • Paul Ferraro

      Paul Ferraro - 2014-04-25

      Can you post more log details?
      Specifically, I'm looking for log messages from JGroups indicating that it detected that server 1 left the group and rejoined (with a new identity).
      I ask because it looks like server 2 still thinks server 1 is the coordinator, even though it should no longer be.

       
      • Justin Cranford

        Justin Cranford - 2014-04-25

        I will retry with debugging on.

        Not sure if this is related, but maybe it sheds more light. Instead of rebooting server 1, I tried restarting Tomcat 1 and leaving MySQL 1 running. Tomcat 2 and 3 fire their deactivation listeners showing MySQL 1 is deactivated? There are no exceptions, and MySQL 1 was untouched.

        Apr 25, 2014 7:21:07 PM
        WARNING: DataSourceManager.showClusterState ACTIVE: db2=10.0.0.69,db3=10.0.0.71
        Apr 25, 2014 7:21:07 PM
        WARNING: DataSourceManager.showClusterState INACTIVE: db1=10.0.0.70
        Apr 25, 2014 7:21:07 PM
        WARNING: ApplicationClusterManager.memberRemoved Detected for Tomcat '10.0.0.70:5900' in application cluster. Event: MembershipEvent {Member [10.0.0.70]:5900} removed

        The third message is from Hazelcast, an application clustering system similar to HA-JDBC. I use it for application clustering independent of HA-JDBC, in configurations of N Tomcats with 1 database. Anyway, it shows Tomcat 1 leaving the cluster at the same time as MySQL 1 is deactivated.

        When Tomcat 1 stops, I have it calling net.sf.hajdbc.sql.DataSource.stop(). It is the config DataSource, not the proxy DataSource, if you recall my programmatic config code.

        Does this have something to do with Tomcat 1 registering MySQL 1 with local="true"? I don't recall that ever happening in HA-JDBC 2.0.

         
        • Paul Ferraro

          Paul Ferraro - 2014-04-25

          That is anticipated behavior. The lifecycle of a "local" database is meant to be bound to the server instance on which it resides. That is why we proactively deactivate any "local" databases when a node is shutdown. This logic is new to 3.0. See:
          http://sourceforge.net/p/ha-jdbc/discussion/383397/thread/e56984b2/
          https://github.com/ha-jdbc/ha-jdbc/commit/ac55ef9c2a564c4168f301a014392892ef52fb93

           
          • Justin Cranford

            Justin Cranford - 2014-04-25

            I see. So the meaning of local changed in 3.0? It now means embedded Java database, and not necessarily a co-located database on same server?

            I see why you would want a way to shutdown an embedded database during HA-JDBC shutdown, but I need to opt out of that somehow. Is it triggered by DataSource.stop()? If so, why not call deactivate() explicitly from the same location DataSource.stop() is called?

            There are many scenarios where Tomcat gets restarted while MySQL keeps running, or even my web app gets unloaded/reloaded by the container.

            • update the SSL certificate
            • update the HTTP, HTTPS, AJP bindings (ex: toggle between 0.0.0.0:443 and 127.0.0.1:443 for operational mode versus server maintenance mode)
            • update web.xml triggering automatic unload/reload of the webapp.war
            • other triggers for unload/reload of other deployment artifacts beyond webapp.war, so as enterpriseapp.ear, service.sar, etc

            In any case, I was relying on the old functionality in 2.0. What are my options for avoiding or opting out of this logic?

            • remove DataSource.stop() from my webapp.war lifecycle listener?
            • never use local="true", even for co-located external database? but will that have side effects on other functionality like read splitting?
            • add a config setting to opt-out of external, local shutdown?
            • add a parameter to DataSource.stop() to opt-out?
            • remove implicit deactivation, and require explicit deactivation in the same lifecycle listener where stop() is called? that was your original suggestion in the other thread.

            Thanks!

             
            • Justin Cranford

              Justin Cranford - 2014-04-25

              HA-JDBC comments indicate different handling - local to the machine vs. local to the JVM.

              Database.java

              /**
               * Indicates whether or not this database is local to the machine on which the JVM resides.
               * @return true if local, false if remote
               */
              boolean isLocal();
              

              AbstractDatabase.java

              /**
               * {@inheritDoc}
               * @see net.sf.hajdbc.Database#isLocal()
               */
              @ManagedAttribute
              @Description("Indicates whether this database is local to this JVM")
              @Override
              public boolean isLocal()
              

              Maybe another option is to change the local attribute to an enum? Or add a new "embedded" flag to avoid the headache of changing method signatures?

              • remote database
              • local, external database
              • local, embedded database (only this gets deactivated during stop)
               

              Last edit: Justin Cranford 2014-04-25
              • Paul Ferraro

                Paul Ferraro - 2014-04-26

                That's funny. After reading your last post, I was going to suggest changing local="true|false" to locality="remote|local|embedded". Auto deactivation on stop would only need to happen for embedded databases.

                 
                • Paul Ferraro

                  Paul Ferraro - 2014-04-26

                  For now, let me comment out that logic. I'll add the locality attribute to master and removes the local attribute. I've added support in master for auto - configuration updates, I.e. you can configure ha-jdbc using an old schema version and it will automatically rewrite using the current version.

                   
      • Justin Cranford

        Justin Cranford - 2014-04-25

        Am I enabling debugging logging correctly? I am not getting any logs from HA-JDBC and jGroups. I set my JDK logging to FINE, and I am using these jars:

        slf4j-api-1.7.6.jar
        slf4j-jdk14-1.7.6.jar

         
        • Justin Cranford

          Justin Cranford - 2014-04-25

          I also have log4j-1.2.17.jar present, but HA-JDBC is using SLF4j, and that should be using my JDK14 bridge? I don't have any other SLF4j bridges. I see limited INFO messages, but no FINE messages. If HA-JDBC DEBUG mapped to a higher JDK14 log level such as FINER, FINEST, or ALL?

          For jGroups, I see it uses log4j if present, and only falls back to JDK14 if log4j is missing. I need to add -Djgroups.use.jdk_logger=true to force it.

           
          • Paul Ferraro

            Paul Ferraro - 2014-04-25

            The logging implementation in 3.0.x is a bit of an experiment. It is meant to seamlessly adapt to whichever logging framework the server uses. HA-JDBC exposes a logging SPI and uses a ServiceLoader to find the first available logging provider using the following prioritized list:
            https://github.com/ha-jdbc/ha-jdbc/blob/3.0/src/main/resources/META-INF/services/net.sf.hajdbc.logging.LoggingProvider

            Ironically, you should find a log messsage somewhere in your server log that indicates the logging framework HA-JDBC decided to use.

             
      • Justin Cranford

        Justin Cranford - 2014-04-25

        Got it. I did an override in logging.properties to set them both to ALL.

        Here is the error which repeats over and over as soon as I restart Tomcat 1. I see db1 is active when Tomcat 1 stops, and inactive when it starts again. I also see this repeating jGroups error as soon as it starts.

        -------------------------------------------------------------------
        GMS: address=Defaulthostname-27695, cluster=mycluster.lock, physical address=10.0.0.70:7900
        ------------------------------------------------------------------
        -------------------------------------------------------------------
        GMS: address=Defaulthostname-38029, cluster=mycluster.state, physical address=10.0.0.70:7900
        -------------------------------------------------------------------
        - JGRP000031: Defaulthostname-38029: dropping unicast message to wrong destination 89cccc5f-97f2-3cf3-ce2c-1544aaf343c9
        - JGRP000031: Defaulthostname-38029: dropping unicast message to wrong destination 89cccc5f-97f2-3cf3-ce2c-1544aaf343c9
        WARNING: DataSourceManager.initialize Found inactive database 'mycluster1'.
        WARNING: DataSourceManager.showClusterState ACTIVE: mycluster2=10.0.0.69,mycluster3=10.0.0.71
        WARNING: DataSourceManager.showClusterState INACTIVE: mycluster1=10.0.0.70
        - JGRP000031: Defaulthostname-38029: dropping unicast message to wrong destination 89cccc5f-97f2-3cf3-ce2c-1544aaf343c9
        - JGRP000031: Defaulthostname-38029: dropping unicast message to wrong destination 89cccc5f-97f2-3cf3-ce2c-1544aaf343c9
        - JGRP000031: Defaulthostname-38029: dropping unicast message to wrong destination 89cccc5f-97f2-3cf3-ce2c-1544aaf343c9
        - JGRP000031: Defaulthostname-38029: dropping unicast message to wrong destination 89cccc5f-97f2-3cf3-ce2c-1544aaf343c9
        - JGRP000031: Defaulthostname-38029: dropping unicast message to wrong destination 89cccc5f-97f2-3cf3-ce2c-1544aaf343c9
        - JGRP000031: Defaulthostname-38029: dropping unicast message to wrong destination 89cccc5f-97f2-3cf3-ce2c-1544aaf343c9
        - JGRP000031: Defaulthostname-38029: dropping unicast message to wrong destination 89cccc5f-97f2-3cf3-ce2c-1544aaf343c9
        

        I suspect calling DataSource.stop() is deactivating MySQL 1 when Tomcat 1 stops, and Tomcat 1 cannot rejoin the cluster after it starts because of this jGroups communication error.

         
1 2 3 4 > >> (Page 1 of 4)

Log in to post a comment.