HyperSQL Database Engine (HSQLDB) / Bugs / #938 "Integrity constraint violation" after prolonged work

#938 "Integrity constraint violation" after prolonged work

Milestone: None

Status: closed-fixed

Owner: Fred Toussi

Labels: engine (518)

Priority: 1

Updated: 2017-03-02

Created: 2009-11-16

Creator: Anonymous

Private: No

While attempting to track the reasons of great memory use (which I mentioned in another report),
with revision 3282, I also observed an effect which I believe might be separate from it,
but might be worth reporting.

It occurs after an HSQL server has operated and serviced clients for a notable time,
often as long as 6 hours. Out of the multitude of clients this server services,
it *seems* to be occurring with one particular client -- a client which frequently
disconnects from the server and reconnects to it, since its network link
is unstable and it's been designed this way to gracefully resume operation.

I have never seen it with the same database and nearly the same client
previously on HSQL 1.8.

The error message the server issues, implies that a problem with indexes
was encountered while inserting a value into an index, but surprisingly
the call stack suggests that the call actually originated from a rollback operation.
I am unsure what to think about it:

org.hsqldb.HsqlException: integrity constraint violation: unique constraint or index violation
at org.hsqldb.error.Error.error(Error.java:112)
at org.hsqldb.error.Error.error(Error.java:82)
at org.hsqldb.index.IndexAVL.insert(IndexAVL.java:489)
at org.hsqldb.persist.RowStoreAVL.indexRow(RowStoreAVL.java:130)
at org.hsqldb.TransactionManager2PL.rollbackPartial(TransactionManager2PL.java:362)
at org.hsqldb.TransactionManager2PL.rollbackAction(TransactionManager2PL.java:320)
at org.hsqldb.Session.endAction(Session.java:458)
at org.hsqldb.Session.executeCompiledStatement(Session.java:1197)
at org.hsqldb.Session.execute(Session.java:874)
at org.hsqldb.server.ServerConnection.receiveResult(ServerConnection.java:357)
at org.hsqldb.server.ServerConnection.run(ServerConnection.java:1522)
at java.lang.Thread.run(Thread.java:619)

When this happens, it does seem to effectively guarantee that a SHUTDOWN COMPACT
statement won't execute correctly, while an onling backup with the BLOCKING option,
followed by a SHUTDOWN IMMEDIATELY (my usual way of recovering
from this situation) generally does execute correctly.

I have no reason to suspct that at the time of this error ocurring,
an actual integrity constraint violation would occur, since the data this program processes
is repetitive in nature, and then I'd be getting these errors multiple times per day.

What is meanwhile ocurring on the client (in case it provides any insight into the problem),
is typically a saldo import operation, whereby a lot (usually about 40'000) warehouse saldos
are read from an XML file, which has been supplied by a separate program.

In the below excerpt from my logs, the client has successfully imported 19'000 saldos,
then suddenly seems to pause for several minutes, reports the error, and proceeds to skip the rest,
since its ability to find article records from the article table has apparently been impaired
by something:

2009-11-14 18:35:52,980 [main] DEBUG xmlrelay.Main - Importing saldo 18000
2009-11-14 18:36:23,570 [main] DEBUG xmlrelay.Main - Importing saldo 19000
2009-11-14 18:41:06,202 [main] ERROR xmlrelay.Main - Error updating saldo timestamps.
java.sql.SQLTransientConnectionException: connection exception: connection failure: java.io.EOFException
at org.hsqldb.jdbc.Util.sqlException(Util.java:259)
at org.hsqldb.jdbc.Util.sqlException(Util.java:103)
at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(JDBCPreparedStatement.java:4503)
at org.hsqldb.jdbc.JDBCPreparedStatement.execute(JDBCPreparedStatement.java:1102)
at xmlrelay.Database.updateSaldoTimestamps(Unknown Source)
at xmlrelay.Database.importSaldos(Unknown Source)
at xmlrelay.Main.importSaldos(Unknown Source)
at xmlrelay.Main.importFile(Unknown Source)
at xmlrelay.Main.importFiles(Unknown Source)
at xmlrelay.Main.<init>(Unknown Source)
at xmlrelay.Main.main(Unknown Source)
Caused by: org.hsqldb.HsqlException: connection exception: connection failure: java.io.EOFException
at org.hsqldb.error.Error.error(Error.java:77)
at org.hsqldb.ClientConnection.execute(ClientConnection.java:211)
at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(JDBCPreparedStatement.java:4501)
... 8 more
2009-11-14 18:41:06,252 [main] ERROR xmlrelay.Main - Skipping saldo, article 148-E122HD187 batch is unknown!
2009-11-14 18:41:06,254 [main] ERROR xmlrelay.Main - Skipping saldo, article 148-E128HD24 batch is unknown!
2009-11-14 18:41:06,256 [main] ERROR xmlrelay.Main - Skipping saldo, article 148-E12HD53 batch is unknown!
2009-11-14 18:41:06,258 [main] ERROR xmlrelay.Main - Skipping saldo, article 148-E134HD06 batch is unknown!
2009-11-14 18:41:06,276 [main] ERROR xmlrelay.Main - Skipping saldo, article 148-E134HD31 batch is unknown!
2009-11-14 18:41:06,278 [main] ERROR xmlrelay.Main - Skipping saldo, article 148-E135HD14 batch is unknown!
...etc...

I can additionally certify that no network connectivity problems could have been involved,
since the HSQL server and client were operating on the same host.

Discussion

Nobody/Anonymous - 2009-11-16

One clarification: when I write "since its network link is unstable" about the client, I meant to say that under ordinary conditions, its network link can be unstable. However, during the test run, it was hosted on the same machine as the server, as noted at the end.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2009-11-16

Correction to my previous mistake about client behaviour:

- sorry, but my previous report about the client reconnecting often was mistaken, I only now realized that it no longer does that, so all errors which proceed are occurring on the same JDBC connection as the first error - the server's ability to accept new connections is *not* impaired

- after the first error has occurred, the client proceeds reporting the following type of errors:

java.sql.SQLTransientConnectionException: connection exception: connection failure: java.net.SocketException: Broken pipe
at org.hsqldb.jdbc.Util.sqlException(Util.java:259)
at org.hsqldb.jdbc.Util.sqlException(Util.java:103)
at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(JDBCPreparedStatement.java:4503)
at org.hsqldb.jdbc.JDBCPreparedStatement.execute(JDBCPreparedStatement.java:1102)
at ...(trace continues to client application methods)...

Caused by: org.hsqldb.HsqlException: connection exception: connection failure: java.net.SocketException: Broken pipe
at org.hsqldb.error.Error.error(Error.java:77)
at org.hsqldb.ClientConnection.execute(ClientConnection.java:211)
at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(JDBCPreparedStatement.java:4501)
... 8 more

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2009-11-16

In addition to the afore mentioned, I would also like to mention that sometimes, issuing "BACKUP DATABASE TO '/some/directory/' BLOCKING" followed by "SHUTDOWN IMMEDIATELY", causes the backup to proceed, but the shutdown does not actually complete, so either the "kill" command or pressing Ctrl-C (if applicable) is needed.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2009-11-16

Some questions?
- Is this particular client is the only one that inserts large amounts (40000) of warehouse inventory ? items in one operation?
- How does it insert / update the data. Does it commit after each item or is is all a big transaction?
- If it is a big transaction, is is possible that OOM error was caused during the op?
- Do you have an .app.log file, if so, does it contain any relevant information? (You can turn on the internal app log or use log4j to log externally).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

arp - 2009-12-14

Doing a summary of the bugs I've reported, I can note that this effect hasn't occurred again approximately since revision 3293.

At approximately the same time, I made sure that my sessions explicitly issued "SET AUTOCOMMIT TRUE", which may have played a role, but I didn't get to verify that.

Since I haven't been able to observe this error for more than three weeks, it's probably gone for good.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2009-12-14

priority: 5 --> 1

status: open --> open-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2009-12-14

Code has been reviewed since the report and some issues were found and fixed. Now seems to work under heavy server load.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2010-03-09

assigned_to: nobody --> fredt

status: open-fixed --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

"Integrity constraint violation" after prolonged work

Group

Searches

Help

#938 "Integrity constraint violation" after prolonged work

Discussion