[Bigdata-developers] Concurrent write and query with slice stress test (was RE: FW: Errors in bulk

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Current write on the database when combined with concurrent SPARQL queries involving LIMIT without the use of ORDER BY clauses can cause the producer reading on the database to be concurrently interrupted. If this occurs during a write, then the logic in FileChannelUtility.writeAll() should be invoked to handle the interrupt and retry the write. In order to trigger this condition, there needs to be a sufficient data scale and query mixture such that the pipeline join will process more than one chunk of solutions and hence have an opportunity to be interrupted (when the Iterator is closed in response to the LIMIT) while reading on the FileChannel.  That interrupt will prompt the AsynchronousClosedException.  FileChannelUtility.writeAll() should handle that exception and retry the write.

While BSBM has query mixtures and data scale (at BSBM 100M) to trigger the interrupts of readers on a regular basis, the benchmark does not include concurrent database writes and is thus unable to trigger retries of writes in FileChannelUtility.writeAll().

Cluster based testing exercises similar conditions because of the concurrent execution of different tasks reading and writing on different shards, but where the write set is always buffered by the current Journal.  In some ways, cluster based testing is the easiest way to test for this problem.  The alternative is to develop fairly complex benchmarks which combine concurrent writes and reads.

Thanks,
Bryan

________________________________
From: Bryan Thompson
Sent: Thursday, July 29, 2010 6:57 AM
To: Bryan Thompson; Martyn Cutcher
Cc: Bigdata Developers
Subject: RE: [Bigdata-developers] FW: Errors in bulk load tests (regression)

This appears to be related to https://sourceforge.net/apps/trac/bigdata/ticket/118, which points out that the recently added invocation of flushWriteCache() from commit() may be at fault.  I concur.  It appears that flushWriteCache() is being invoked without the appropriate lock held.  This would be a problem for scale-out where concurrent tasks writing and reading on different shards will be running during a commit.  It could also be a problem for a standalone Journal with writes under concurrent high level query combined with a LIMIT which would cause the producer to be interrupted while reading on the backing file.

While the WORMStrategy contains a similar call from commit() to flush the write cache, the implemention of the write cache differs by acquiring an internal lock which prevents concurrent modification during the write.  The DiskOnlyStrategy's write cache is not protected against concurrent modification other than by synchronizing on [this].

Also, a correction to my email below, DiskOnlyStrategy does not use the RecurrentReadWriteLock. That is only used by the WORMStrategy.  This implies that the full performance of the database for standalone Journal operations is only available in the DiskWORM mode rather than the Disk mode.

At this point I would suggest that we explore why the commit() -> flushWriteCache() call was introduced, modify the StoreManager to use the WORMStrategy, and then cut over to the WORMStrategy [https://sourceforge.net/apps/trac/bigdata/ticket/123] entirely, removing DiskOnlyStrategy from the code base and issuing a new release.

The workaround for non-scale-out users is to change the properties file from Disk to DiskWORM (the two modes have binary compatibility but DiskWORM has higher performance).

Bryan

________________________________
From: Bryan Thompson [mailto:br...@sy...]
Sent: Thursday, July 29, 2010 6:35 AM
To: Martyn Cutcher
Cc: Bigdata Developers
Subject: Re: [Bigdata-developers] FW: Errors in bulk load tests (regression)

Martyn,

Interesting.  You propose that:

> nwritten = channel.write(data, pos + count);

could have written some bytes, and updated data.position(), before an exception was thrown and handled by a retry?  It can be a bit difficult to get to the bottom of the FileChannel semantics, but if this is a reasonable suspicion we could always add a counter to track the #of exceptions caught and handled by a reopen/retry of the remaining write.

If this is the case, we might be able to demonstrate the problem by modifying one of the test suites to interrupt readers.  I've looked over the various test suites.  While AbstractInterruptsTestCase is aimed at the behavior of the system when threads are interrupted, it is not setup as a stress test. However, I think that it would be easy enough to modify AbtractMRMWTestCase to keep a concurrent hash map of the running readers and then introduce random interrupts of reader tasks by writer tasks.

The reason to interrupt the readers is that this will provoke the asychronous close of the backing channel such that writers will have to handle that event in FileChannelUtility.writeAll().

It occurs to me that we could have been masking a problem using the LRUNexus to buffer the disk in the application.  The concurrent interrupt can only arise when the read request makes it down to the FileChannel.  A hit on the concurrent hash map (LRUNexus) for the store would not read through and hence would reduce the opportunity to observe this error.

One of the things that I changed in the release was to disable the LRUNexus by default.  The concurrent record cache was a big performance win while we were synchronized on disk IO operations in DiskOnlyStrategy and WORMStrategy.  Now that these classes use a RecurrentReadWriteLock to work around a Sun bug with concurrent IO during a file extension, the LRUNexus no longer offers any performance benefit under heavy concurrent query mixtures so I have disabled it by default to let reads go through to the file system cache.  However, this condition (no LRUNexus) has not been tested for scale-out.

This choice (LRUNexus disabled) really needs to be qualified for both performance and correctness on a cluster and we should explore the impact on throughput of reducing the memory allocated to the JVMs when the LRUNexus is disabled since that will leave more memory available to the OS for the file cache.

Reviewing the bigdataCluster{16}.config and bigdataStandalone.config files, I see that the bigdataCluster16.config file has some explicit configuration of the LRUNexus but does not force it to be enabled.  The bigdataCluster.config and bigdataStandalone files do not have any explicit configuration of the LRUNexus.  Therefore, in all cases it will now be disabled by default for the federation installs.  Given this, it is possible that we were masking a problem in FileChannelUtility.writeAll() given semantics for FileChannel.write() which allow the buffer position() to be updated even if the request throws an exception due to an asynchronous file channel close.

To test this, please add the following line to the defaultJavaArgs[] of com.bigdata.jini.start.config.ServiceConfiguration in the bigdata configuration file:

"-Dcom.bigdata.LRUNexus.enabled=true",

Also, please let me know how reproducable this problem is.

Thanks,
Bryan

________________________________
From: Martyn Cutcher [mailto:ma...@sy...]
Sent: Thursday, July 29, 2010 5:42 AM
To: Bryan Thompson
Subject: Re: FW: [Bigdata-developers] Errors in bulk load tests (regression)

...initial findings

The first stack trace come up with

Expecting to write 94659 bytes, but wrote 83950 bytes in 1

>From FileChannelUtility.  This is called as a result of the call to flush the writeCache that I added to DiskOnlyStrategy.commit().  I don't believe that this is itself an error even though it may have resulted in this error to become apparent.

Examining FileChannelUtility.writeAll, it is possible that this could be thrown if we have a repeat write following AsynchronousCloseException or ClosedChannelException since it is possible that these could be thrown after the buffer has been partly processed (decrementing the bytes remaining).  This could be guarded against by resetting the ByteBuffer position before calling continue.

..continuing...

- Martyn

Bryan Thompson wrote:
Martyn,

Can you take a look at these errors?

Bryan

________________________________
From: Fred Oliver [mailto:fko...@gm...]
Sent: Wednesday, July 28, 2010 5:07 PM
To: Bigdata Developers
Subject: [Bigdata-developers] Errors in bulk load tests (regression)

I'm running a test which generates (essentially lubm) data files and invokes MappedRDFDataLoadMaster.main(). The configuration is multiple services running on a single machine.

After svn update to the current branch, the test appears to run forever with many exceptions seen in the log file, apparently relating to IO. I've attached two versions of the error.log file.

One shows errors from FileChannelUtility, BufferOverflowException, etc.
The other shows AssertionErrors being throw between processes.

Would one of you take a look, please?

Fred

[Bigdata-developers] Concurrent write and query with slice stress test (was RE: FW: Errors in bulk

Fast, scalable, robust graph database platform

[Bigdata-developers] Concurrent write and query with slice stress test (was RE: FW: Errors in bulk load tests (regression))