This list is closed, nobody may subscribe to it.
| 2010 |
Jan
|
Feb
(19) |
Mar
(8) |
Apr
(25) |
May
(16) |
Jun
(77) |
Jul
(131) |
Aug
(76) |
Sep
(30) |
Oct
(7) |
Nov
(3) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(2) |
Jul
(16) |
Aug
(3) |
Sep
(1) |
Oct
|
Nov
(7) |
Dec
(7) |
| 2012 |
Jan
(10) |
Feb
(1) |
Mar
(8) |
Apr
(6) |
May
(1) |
Jun
(3) |
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
(8) |
Dec
(2) |
| 2013 |
Jan
(5) |
Feb
(12) |
Mar
(2) |
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
(22) |
Aug
(50) |
Sep
(31) |
Oct
(64) |
Nov
(83) |
Dec
(28) |
| 2014 |
Jan
(31) |
Feb
(18) |
Mar
(27) |
Apr
(39) |
May
(45) |
Jun
(15) |
Jul
(6) |
Aug
(27) |
Sep
(6) |
Oct
(67) |
Nov
(70) |
Dec
(1) |
| 2015 |
Jan
(3) |
Feb
(18) |
Mar
(22) |
Apr
(121) |
May
(42) |
Jun
(17) |
Jul
(8) |
Aug
(11) |
Sep
(26) |
Oct
(15) |
Nov
(66) |
Dec
(38) |
| 2016 |
Jan
(14) |
Feb
(59) |
Mar
(28) |
Apr
(44) |
May
(21) |
Jun
(12) |
Jul
(9) |
Aug
(11) |
Sep
(4) |
Oct
(2) |
Nov
(1) |
Dec
|
| 2017 |
Jan
(20) |
Feb
(7) |
Mar
(4) |
Apr
(18) |
May
(7) |
Jun
(3) |
Jul
(13) |
Aug
(2) |
Sep
(4) |
Oct
(9) |
Nov
(2) |
Dec
(5) |
| 2018 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2019 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: husdon <no...@no...> - 2010-07-30 01:16:03
|
See <http://localhost/job/BigData/changes> |
|
From: husdon <no...@no...> - 2010-07-29 21:32:13
|
See <http://localhost/job/BigData/changes> |
|
From: husdon <no...@no...> - 2010-07-29 20:39:50
|
See <http://localhost/job/BigData/changes> |
|
From: Mike P. <mi...@sy...> - 2010-07-29 20:29:29
|
We've just released a new version of Bigdata(r). This is a bigdata(r) snapshot release. This release is capable of loading 1B triples in under one hour on a 15 node cluster and has been used to load up to 13B triples on the same cluster. JDK 1.6 is required. See [1] for instructions on installing bigdata(r), [2] for the javadoc and [3] and [4] for news, questions, and the latest developments. For more information about SYSTAP, LLC and bigdata(r), see [5]. Please note that we recommend checking out the code from SVN using the tag for this release. The code will build automatically under eclipse. You can also build the code using the ant script. The cluster installer requires the use of the ant script. You can checkout this release from the following URL: https://bigdata.svn.sourceforge.net/svnroot/bigdata/branches/BIGDATA_RELEASE_0_83_2 New features: - This release provides a bug fix for issue#118. Upgrade to this release is advised. See https://sourceforge.net/apps/trac/bigdata/ticket/118 for details. - Inlining XSD numerics, xsd:boolean, or custom datatype extensions into the statement indices. Inlining provides a smaller footprint and faster queries for data using XSD numeric datatypes. In order to introduce inlining we were forced to make a change in the physical schema for the RDF database which breaks binary compatibility for existing stores. The recommended migration path is to export the data and import it into a new bigdata instance. - Refactor of the dynamic sharding mechanism for higher performance. - The SparseRowStore has been modified to make Unicode primary keys decodable by representing Unicode primary keys using UTF8 rather than Unicode sort keys. This change also allows the SparseRowStore to work with the JDK collator option which embeds nul bytes into Unicode sort keys. This change breaks binary compatibility, but there is an option for historical compatibility. The roadmap for the next releases include: - Query optimizations; - Support for high-volume analytic query workloads and SPARQL aggregations; - High availability for the journal and the cluster; - Simplified deployment, configuration, and administration for clusters. For more information, please see the following links: [1] http://bigdata.wiki.sourceforge.net/GettingStarted [2] http://www.bigdata.com/bigdata/docs/api/ [3] http://sourceforge.net/projects/bigdata/ [4] http://www.bigdata.com/blog [5] http://www.systap.com/bigdata.htm About bigdata: Bigdata(r) is a horizontally-scaled, general purpose storage and computing fabric for ordered data (B+Trees), designed to operate on either a single server or a cluster of commodity hardware. Bigdata(r) uses dynamically partitioned key-range shards in order to remove any realistic scaling limits - in principle, bigdata(r) may be deployed on 10s, 100s, or even thousands of machines and new capacity may be added incrementally without requiring the full reload of all data. The bigdata(r) RDF database supports RDFS and OWL Lite reasoning, high-level query (SPARQL), and datum level provenance. --- Mike Personick SYSTAP, LLC. 801.328.3945 (office) 801.243.3678 (mobile) 801.938.5320 (skype) mi...@sy...<mailto:mi...@sy...> |
|
From: husdon <no...@no...> - 2010-07-29 19:48:08
|
See <http://localhost/job/BigData/changes> |
|
From: Bryan T. <br...@sy...> - 2010-07-29 19:44:23
|
All, I've created a trac issue [1] and a wiki page [2] for a project release guide. Please use the track issue (or this email list) to discuss ways to improve our release process and let's capture that on the wiki page. Thanks, Bryan [1] https://sourceforge.net/apps/trac/bigdata/ticket/130 [2] https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=ReleaseGuide |
|
From: husdon <no...@no...> - 2010-07-29 18:56:55
|
See <http://localhost/job/BigData/changes> |
|
From: husdon <no...@no...> - 2010-07-29 18:05:22
|
See <http://localhost/job/BigData/changes> |
|
From: Fred O. <fko...@gm...> - 2010-07-29 17:22:33
|
Yes, the "if(false..." appears to work around the problem. Fred On Thu, Jul 29, 2010 at 1:09 PM, Bryan Thompson <br...@sy...> wrote: > Fred, > > The multi-block iterator is brand new. This appears to be a fence post with a zero byte read from the store. I suspect that this is an edge case where either the key-range was empty or the index segment was empty. If you can identify which index segment file it was from the log, please email it to me or attach it to [1]. > > You can work around this by modifying AbstractBTree#2845 to read "if (false...". That will turn off the multi-block iterator. > > if (true > && ... > > I'll see if I can track this down now. > > Thanks, > Bryan |
|
From: husdon <no...@no...> - 2010-07-29 17:13:10
|
See <http://localhost/job/BigData/changes> |
|
From: Bryan T. <br...@sy...> - 2010-07-29 17:09:55
|
Fred,
The multi-block iterator is brand new. This appears to be a fence post with a zero byte read from the store. I suspect that this is an edge case where either the key-range was empty or the index segment was empty. If you can identify which index segment file it was from the log, please email it to me or attach it to [1].
You can work around this by modifying AbstractBTree#2845 to read "if (false...". That will turn off the multi-block iterator.
if (true
&& ...
I'll see if I can track this down now.
Thanks,
Bryan
[1] https://sourceforge.net/apps/trac/bigdata/ticket/128
> -----Original Message-----
> From: Fred Oliver [mailto:fko...@gm...]
> Sent: Thursday, July 29, 2010 12:57 PM
> To: Bryan Thompson
> Cc: Bigdata Developers
> Subject: Re: [Bigdata-developers] Errors in bulk load tests
> (regression)
>
> Perhaps a bit premature ...
>
> The bulk loading worked, but one of the queries in the
> subsequent test is failing repeatedly with an
> IllegalArgumentException being thrown at this point:
>
> com.bigdata.io.FileChannelUtility.readAll(FileChannelUtility.java:148)
> com.bigdata.btree.IndexSegmentStore.readFromFile(IndexSegmentS
> tore.java:1091)
> com.bigdata.btree.IndexSegmentMultiBlockIterator.nextBlock(Ind
> exSegmentMultiBlockIterator.java:479)
> com.bigdata.btree.IndexSegmentMultiBlockIterator.nextLeaf(Inde
> xSegmentMultiBlockIterator.java:381)
> com.bigdata.btree.IndexSegmentMultiBlockIterator._hasNext(Inde
xSegmentMultiBlockIterator.java:319)
> com.bigdata.btree.IndexSegmentMultiBlockIterator.hasNext(Index
> SegmentMultiBlockIterator.java:283)
> com.bigdata.btree.view.FusedTupleIterator.hasNext(FusedTupleIt
> erator.java:203)
> com.bigdata.btree.ResultSet.<init>(ResultSet.java:1102)
> com.bigdata.service.DataService$RangeIteratorTask.doTask(DataS
> ervice.java:1726)
> com.bigdata.service.DataService$RangeIteratorTask.doTask(DataS
> ervice.java:1672)
> com.bigdata.journal.AbstractTask.call2(AbstractTask.java:1703)
> com.bigdata.journal.AbstractTask.call(AbstractTask.java:1592)
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> java.util.concurrent.FutureTask.run(FutureTask.java:138)
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadP
> oolExecutor.java:886)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolE
> xecutor.java:908)
> java.lang.Thread.run(Thread.java:619)
>
> Fred
>
> On Thu, Jul 29, 2010 at 12:12 PM, Bryan Thompson
> <br...@sy...> wrote:
> > Super.
>
|
|
From: Bryan T. <br...@sy...> - 2010-07-29 16:58:45
|
Mike,
Can you:
- blog it
- post the release notes to the sourceforce project news (admin ->
features -> news -> submit)
- send out an email announcement.
- update: http://semanticweb.org/wiki/Bigdata (version and release date).
The release notes are inline below.
Thanks,
Bryan
This is a bigdata (R) snapshot release. This release is capable of loading 1B
triples in under one hour on a 15 node cluster and has been used to load up to
13B triples on the same cluster. JDK 1.6 is required.
See [1] for instructions on installing bigdata(R), [2] for the javadoc and [3]
and [4] for news, questions, and the latest developments. For more information
about SYSTAP, LLC and bigdata, see [5].
Please note that we recommend checking out the code from SVN using the tag for
this release. The code will build automatically under eclipse. You can also
build the code using the ant script. The cluster installer requires the use of
the ant script. You can checkout this release from the following URL:
https://bigdata.svn.sourceforge.net/svnroot/bigdata/branches/BIGDATA_RELEASE_0_83_2
New features:
- This release provides a bug fix for issue#118. Upgrade to this release is
advised. See https://sourceforge.net/apps/trac/bigdata/ticket/118 for details.
- Inlining XSD numerics, xsd:boolean, or custom datatype extensions
into the statement indices. Inlining provides a smaller footprint
and faster queries for data using XSD numeric datatypes. In order
to introduce inlining we were forced to make a change in the
physical schema for the RDF database which breaks binary
compatibility for existing stores. The recommended migration path
is to export the data and import it into a new bigdata instance.
- Refactor of the dynamic sharding mechanism for higher performance.
- The SparseRowStore has been modified to make Unicode primary keys
decodable by representing Unicode primary keys using UTF8 rather
than Unicode sort keys. This change also allows the SparseRowStore
to work with the JDK collator option which embeds nul bytes into
Unicode sort keys. This change breaks binary compatibility, but
there is an option for historical compatibility.
The roadmap for the next releases include:
- Query optimizations;
- Support for high-volume analytic query workloads and SPARQL aggregations;
- High availability for the journal and the cluster;
- Simplified deployment, configuration, and administration for clusters.
For more information, please see the following links:
[1] http://bigdata.wiki.sourceforge.net/GettingStarted
[2] http://www.bigdata.com/bigdata/docs/api/
[3] http://sourceforge.net/projects/bigdata/
[4] http://www.bigdata.com/blog
[5] http://www.systap.com/bigdata.htm
About bigdata:
Bigdata(r) is a horizontally-scaled, general purpose storage and computing fabric
for ordered data (B+Trees), designed to operate on either a single server or a
cluster of commodity hardware. Bigdata(r) uses dynamically partitioned key-range
shards in order to remove any realistic scaling limits - in principle, bigdata(r)
may be deployed on 10s, 100s, or even thousands of machines and new capacity may
be added incrementally without requiring the full reload of all data. The bigdata(r)
RDF database supports RDFS and OWL Lite reasoning, high-level query (SPARQL),
and datum level provenance.
|
|
From: Fred O. <fko...@gm...> - 2010-07-29 16:56:55
|
Perhaps a bit premature ... The bulk loading worked, but one of the queries in the subsequent test is failing repeatedly with an IllegalArgumentException being thrown at this point: com.bigdata.io.FileChannelUtility.readAll(FileChannelUtility.java:148) com.bigdata.btree.IndexSegmentStore.readFromFile(IndexSegmentStore.java:1091) com.bigdata.btree.IndexSegmentMultiBlockIterator.nextBlock(IndexSegmentMultiBlockIterator.java:479) com.bigdata.btree.IndexSegmentMultiBlockIterator.nextLeaf(IndexSegmentMultiBlockIterator.java:381) com.bigdata.btree.IndexSegmentMultiBlockIterator._hasNext(IndexSegmentMultiBlockIterator.java:319) com.bigdata.btree.IndexSegmentMultiBlockIterator.hasNext(IndexSegmentMultiBlockIterator.java:283) com.bigdata.btree.view.FusedTupleIterator.hasNext(FusedTupleIterator.java:203) com.bigdata.btree.ResultSet.<init>(ResultSet.java:1102) com.bigdata.service.DataService$RangeIteratorTask.doTask(DataService.java:1726) com.bigdata.service.DataService$RangeIteratorTask.doTask(DataService.java:1672) com.bigdata.journal.AbstractTask.call2(AbstractTask.java:1703) com.bigdata.journal.AbstractTask.call(AbstractTask.java:1592) java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) java.util.concurrent.FutureTask.run(FutureTask.java:138) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) java.lang.Thread.run(Thread.java:619) Fred On Thu, Jul 29, 2010 at 12:12 PM, Bryan Thompson <br...@sy...> wrote: > Super. |
|
From: Mike P. <mi...@sy...> - 2010-07-29 16:51:00
|
I think it should be 0.83.2 -----Original Message----- From: Bryan Thompson [mailto:br...@sy...] Sent: Thursday, July 29, 2010 10:48 AM To: Bigdata Developers Subject: [Bigdata-developers] release 0.82.2 All, I am going to do a bug fix release which addresses [1]. Please hold off on commits for a few minutes while I tag the branch. I plan to follow with another release shortly which addresses [2], but I would like to get this bug fix out now. Thanks, Bryan [1] https://sourceforge.net/apps/trac/bigdata/ticket/118 [2] https://sourceforge.net/apps/trac/bigdata/ticket/123 ------------------------------------------------------------------------------ The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm _______________________________________________ Bigdata-developers mailing list Big...@li... https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
|
From: Bryan T. <br...@sy...> - 2010-07-29 16:48:24
|
All, I am going to do a bug fix release which addresses [1]. Please hold off on commits for a few minutes while I tag the branch. I plan to follow with another release shortly which addresses [2], but I would like to get this bug fix out now. Thanks, Bryan [1] https://sourceforge.net/apps/trac/bigdata/ticket/118 [2] https://sourceforge.net/apps/trac/bigdata/ticket/123 |
|
From: Bryan T. <br...@sy...> - 2010-07-29 16:13:19
|
Super. Fred Oliver <fko...@gm...> wrote: The latest update works. Thanks. Fred On Thu, Jul 29, 2010 at 10:59 AM, Bryan Thompson <br...@sy...> wrote: > Fred, > > Please retry. Martyn has committed a change against the trunk which should > resolve this issue. Please let me know if the problem is resolved, in which > case we can also close out > https://sourceforge.net/apps/trac/bigdata/ticket/118. > > Thanks, > Bryan |
|
From: Fred O. <fko...@gm...> - 2010-07-29 15:59:21
|
The latest update works. Thanks. Fred On Thu, Jul 29, 2010 at 10:59 AM, Bryan Thompson <br...@sy...> wrote: > Fred, > > Please retry. Martyn has committed a change against the trunk which should > resolve this issue. Please let me know if the problem is resolved, in which > case we can also close out > https://sourceforge.net/apps/trac/bigdata/ticket/118. > > Thanks, > Bryan |
|
From: husdon <no...@no...> - 2010-07-29 15:43:25
|
See <http://localhost/job/BigData/changes> |
|
From: Bryan T. <br...@sy...> - 2010-07-29 15:00:29
|
Fred, Please retry. Martyn has committed a change against the trunk which should resolve this issue. Please let me know if the problem is resolved, in which case we can also close out https://sourceforge.net/apps/trac/bigdata/ticket/118. Thanks, Bryan ________________________________ From: Fred Oliver [mailto:fko...@gm...] Sent: Wednesday, July 28, 2010 5:55 PM To: Bryan Thompson; and...@no... Subject: Re: [Bigdata-developers] Errors in bulk load tests (regression) Bryan, I agree that performance comparison for the new version is a good thing, though I'm a little surprised that you waited until after the announcement to test. I'll refer you to Andrew about the machines. The problems I'm seeing are on a single workstation, if you could take a look at the exceptions.... Fred On Wed, Jul 28, 2010 at 5:38 PM, Bryan Thompson <br...@sy...<mailto:br...@sy...>> wrote: Fred, I was trying to setup the ci performance machine for this purpose but ran into OS configuration issues which I passed to Brad to look at. We really need to test against a cluster to have confidence here. Can I get access to the 16 node cluster for a few days? Bryan Fred Oliver <fko...@gm...<mailto:fko...@gm...>> wrote: I'm running a test which generates (essentially lubm) data files and invokes MappedRDFDataLoadMaster.main(). The configuration is multiple services running on a single machine. After svn update to the current branch, the test appears to run forever with many exceptions seen in the log file, apparently relating to IO. I've attached two versions of the error.log file. One shows errors from FileChannelUtility, BufferOverflowException, etc. The other shows AssertionErrors being throw between processes. Would one of you take a look, please? Fred |
|
From: husdon <no...@no...> - 2010-07-29 13:03:37
|
See <http://localhost/job/BigData/changes> |
|
From: Martyn C. <ma...@sy...> - 2010-07-29 12:34:58
|
A BufferOverflowException emanates from DiskOnlyStrategy, line 409 since it does not check if there is room in the buffer before putting the data. This is supposedly guarded by the calling DiskOnlyStrategy.write that checks to see if the writeCache needs flushing in order to fit the data. I see this error also from StressTestConcurrentUnisolatedIndices, so that's where I am focussing right now. - Martyn Fred Oliver wrote: > I'm running a test which generates (essentially lubm) data files and > invokes MappedRDFDataLoadMaster.main(). The configuration is multiple > services running on a single machine. > > After svn update to the current branch, the test appears to run > forever with many exceptions seen in the log file, apparently relating > to IO. I've attached two versions of the error.log file. > > One shows errors from FileChannelUtility, BufferOverflowException, etc. > The other shows AssertionErrors being throw between processes. > > Would one of you take a look, please? > > Fred |
|
From: Bryan T. <br...@sy...> - 2010-07-29 11:13:13
|
Current write on the database when combined with concurrent SPARQL queries involving LIMIT without the use of ORDER BY clauses can cause the producer reading on the database to be concurrently interrupted. If this occurs during a write, then the logic in FileChannelUtility.writeAll() should be invoked to handle the interrupt and retry the write. In order to trigger this condition, there needs to be a sufficient data scale and query mixture such that the pipeline join will process more than one chunk of solutions and hence have an opportunity to be interrupted (when the Iterator is closed in response to the LIMIT) while reading on the FileChannel. That interrupt will prompt the AsynchronousClosedException. FileChannelUtility.writeAll() should handle that exception and retry the write. While BSBM has query mixtures and data scale (at BSBM 100M) to trigger the interrupts of readers on a regular basis, the benchmark does not include concurrent database writes and is thus unable to trigger retries of writes in FileChannelUtility.writeAll(). Cluster based testing exercises similar conditions because of the concurrent execution of different tasks reading and writing on different shards, but where the write set is always buffered by the current Journal. In some ways, cluster based testing is the easiest way to test for this problem. The alternative is to develop fairly complex benchmarks which combine concurrent writes and reads. Thanks, Bryan ________________________________ From: Bryan Thompson Sent: Thursday, July 29, 2010 6:57 AM To: Bryan Thompson; Martyn Cutcher Cc: Bigdata Developers Subject: RE: [Bigdata-developers] FW: Errors in bulk load tests (regression) This appears to be related to https://sourceforge.net/apps/trac/bigdata/ticket/118, which points out that the recently added invocation of flushWriteCache() from commit() may be at fault. I concur. It appears that flushWriteCache() is being invoked without the appropriate lock held. This would be a problem for scale-out where concurrent tasks writing and reading on different shards will be running during a commit. It could also be a problem for a standalone Journal with writes under concurrent high level query combined with a LIMIT which would cause the producer to be interrupted while reading on the backing file. While the WORMStrategy contains a similar call from commit() to flush the write cache, the implemention of the write cache differs by acquiring an internal lock which prevents concurrent modification during the write. The DiskOnlyStrategy's write cache is not protected against concurrent modification other than by synchronizing on [this]. Also, a correction to my email below, DiskOnlyStrategy does not use the RecurrentReadWriteLock. That is only used by the WORMStrategy. This implies that the full performance of the database for standalone Journal operations is only available in the DiskWORM mode rather than the Disk mode. At this point I would suggest that we explore why the commit() -> flushWriteCache() call was introduced, modify the StoreManager to use the WORMStrategy, and then cut over to the WORMStrategy [https://sourceforge.net/apps/trac/bigdata/ticket/123] entirely, removing DiskOnlyStrategy from the code base and issuing a new release. The workaround for non-scale-out users is to change the properties file from Disk to DiskWORM (the two modes have binary compatibility but DiskWORM has higher performance). Bryan ________________________________ From: Bryan Thompson [mailto:br...@sy...] Sent: Thursday, July 29, 2010 6:35 AM To: Martyn Cutcher Cc: Bigdata Developers Subject: Re: [Bigdata-developers] FW: Errors in bulk load tests (regression) Martyn, Interesting. You propose that: > nwritten = channel.write(data, pos + count); could have written some bytes, and updated data.position(), before an exception was thrown and handled by a retry? It can be a bit difficult to get to the bottom of the FileChannel semantics, but if this is a reasonable suspicion we could always add a counter to track the #of exceptions caught and handled by a reopen/retry of the remaining write. If this is the case, we might be able to demonstrate the problem by modifying one of the test suites to interrupt readers. I've looked over the various test suites. While AbstractInterruptsTestCase is aimed at the behavior of the system when threads are interrupted, it is not setup as a stress test. However, I think that it would be easy enough to modify AbtractMRMWTestCase to keep a concurrent hash map of the running readers and then introduce random interrupts of reader tasks by writer tasks. The reason to interrupt the readers is that this will provoke the asychronous close of the backing channel such that writers will have to handle that event in FileChannelUtility.writeAll(). It occurs to me that we could have been masking a problem using the LRUNexus to buffer the disk in the application. The concurrent interrupt can only arise when the read request makes it down to the FileChannel. A hit on the concurrent hash map (LRUNexus) for the store would not read through and hence would reduce the opportunity to observe this error. One of the things that I changed in the release was to disable the LRUNexus by default. The concurrent record cache was a big performance win while we were synchronized on disk IO operations in DiskOnlyStrategy and WORMStrategy. Now that these classes use a RecurrentReadWriteLock to work around a Sun bug with concurrent IO during a file extension, the LRUNexus no longer offers any performance benefit under heavy concurrent query mixtures so I have disabled it by default to let reads go through to the file system cache. However, this condition (no LRUNexus) has not been tested for scale-out. This choice (LRUNexus disabled) really needs to be qualified for both performance and correctness on a cluster and we should explore the impact on throughput of reducing the memory allocated to the JVMs when the LRUNexus is disabled since that will leave more memory available to the OS for the file cache. Reviewing the bigdataCluster{16}.config and bigdataStandalone.config files, I see that the bigdataCluster16.config file has some explicit configuration of the LRUNexus but does not force it to be enabled. The bigdataCluster.config and bigdataStandalone files do not have any explicit configuration of the LRUNexus. Therefore, in all cases it will now be disabled by default for the federation installs. Given this, it is possible that we were masking a problem in FileChannelUtility.writeAll() given semantics for FileChannel.write() which allow the buffer position() to be updated even if the request throws an exception due to an asynchronous file channel close. To test this, please add the following line to the defaultJavaArgs[] of com.bigdata.jini.start.config.ServiceConfiguration in the bigdata configuration file: "-Dcom.bigdata.LRUNexus.enabled=true", Also, please let me know how reproducable this problem is. Thanks, Bryan ________________________________ From: Martyn Cutcher [mailto:ma...@sy...] Sent: Thursday, July 29, 2010 5:42 AM To: Bryan Thompson Subject: Re: FW: [Bigdata-developers] Errors in bulk load tests (regression) ...initial findings The first stack trace come up with Expecting to write 94659 bytes, but wrote 83950 bytes in 1 >From FileChannelUtility. This is called as a result of the call to flush the writeCache that I added to DiskOnlyStrategy.commit(). I don't believe that this is itself an error even though it may have resulted in this error to become apparent. Examining FileChannelUtility.writeAll, it is possible that this could be thrown if we have a repeat write following AsynchronousCloseException or ClosedChannelException since it is possible that these could be thrown after the buffer has been partly processed (decrementing the bytes remaining). This could be guarded against by resetting the ByteBuffer position before calling continue. ..continuing... - Martyn Bryan Thompson wrote: Martyn, Can you take a look at these errors? Bryan ________________________________ From: Fred Oliver [mailto:fko...@gm...] Sent: Wednesday, July 28, 2010 5:07 PM To: Bigdata Developers Subject: [Bigdata-developers] Errors in bulk load tests (regression) I'm running a test which generates (essentially lubm) data files and invokes MappedRDFDataLoadMaster.main(). The configuration is multiple services running on a single machine. After svn update to the current branch, the test appears to run forever with many exceptions seen in the log file, apparently relating to IO. I've attached two versions of the error.log file. One shows errors from FileChannelUtility, BufferOverflowException, etc. The other shows AssertionErrors being throw between processes. Would one of you take a look, please? Fred |
|
From: Bryan T. <br...@sy...> - 2010-07-29 11:13:02
|
This appears to be related to https://sourceforge.net/apps/trac/bigdata/ticket/118, which points out that the recently added invocation of flushWriteCache() from commit() may be at fault. I concur. It appears that flushWriteCache() is being invoked without the appropriate lock held. This would be a problem for scale-out where concurrent tasks writing and reading on different shards will be running during a commit. It could also be a problem for a standalone Journal with writes under concurrent high level query combined with a LIMIT which would cause the producer to be interrupted while reading on the backing file. While the WORMStrategy contains a similar call from commit() to flush the write cache, the implemention of the write cache differs by acquiring an internal lock which prevents concurrent modification during the write. The DiskOnlyStrategy's write cache is not protected against concurrent modification other than by synchronizing on [this]. Also, a correction to my email below, DiskOnlyStrategy does not use the RecurrentReadWriteLock. That is only used by the WORMStrategy. This implies that the full performance of the database for standalone Journal operations is only available in the DiskWORM mode rather than the Disk mode. At this point I would suggest that we explore why the commit() -> flushWriteCache() call was introduced, modify the StoreManager to use the WORMStrategy, and then cut over to the WORMStrategy [https://sourceforge.net/apps/trac/bigdata/ticket/123] entirely, removing DiskOnlyStrategy from the code base and issuing a new release. The workaround for non-scale-out users is to change the properties file from Disk to DiskWORM (the two modes have binary compatibility but DiskWORM has higher performance). Bryan ________________________________ From: Bryan Thompson [mailto:br...@sy...] Sent: Thursday, July 29, 2010 6:35 AM To: Martyn Cutcher Cc: Bigdata Developers Subject: Re: [Bigdata-developers] FW: Errors in bulk load tests (regression) Martyn, Interesting. You propose that: > nwritten = channel.write(data, pos + count); could have written some bytes, and updated data.position(), before an exception was thrown and handled by a retry? It can be a bit difficult to get to the bottom of the FileChannel semantics, but if this is a reasonable suspicion we could always add a counter to track the #of exceptions caught and handled by a reopen/retry of the remaining write. If this is the case, we might be able to demonstrate the problem by modifying one of the test suites to interrupt readers. I've looked over the various test suites. While AbstractInterruptsTestCase is aimed at the behavior of the system when threads are interrupted, it is not setup as a stress test. However, I think that it would be easy enough to modify AbtractMRMWTestCase to keep a concurrent hash map of the running readers and then introduce random interrupts of reader tasks by writer tasks. The reason to interrupt the readers is that this will provoke the asychronous close of the backing channel such that writers will have to handle that event in FileChannelUtility.writeAll(). It occurs to me that we could have been masking a problem using the LRUNexus to buffer the disk in the application. The concurrent interrupt can only arise when the read request makes it down to the FileChannel. A hit on the concurrent hash map (LRUNexus) for the store would not read through and hence would reduce the opportunity to observe this error. One of the things that I changed in the release was to disable the LRUNexus by default. The concurrent record cache was a big performance win while we were synchronized on disk IO operations in DiskOnlyStrategy and WORMStrategy. Now that these classes use a RecurrentReadWriteLock to work around a Sun bug with concurrent IO during a file extension, the LRUNexus no longer offers any performance benefit under heavy concurrent query mixtures so I have disabled it by default to let reads go through to the file system cache. However, this condition (no LRUNexus) has not been tested for scale-out. This choice (LRUNexus disabled) really needs to be qualified for both performance and correctness on a cluster and we should explore the impact on throughput of reducing the memory allocated to the JVMs when the LRUNexus is disabled since that will leave more memory available to the OS for the file cache. Reviewing the bigdataCluster{16}.config and bigdataStandalone.config files, I see that the bigdataCluster16.config file has some explicit configuration of the LRUNexus but does not force it to be enabled. The bigdataCluster.config and bigdataStandalone files do not have any explicit configuration of the LRUNexus. Therefore, in all cases it will now be disabled by default for the federation installs. Given this, it is possible that we were masking a problem in FileChannelUtility.writeAll() given semantics for FileChannel.write() which allow the buffer position() to be updated even if the request throws an exception due to an asynchronous file channel close. To test this, please add the following line to the defaultJavaArgs[] of com.bigdata.jini.start.config.ServiceConfiguration in the bigdata configuration file: "-Dcom.bigdata.LRUNexus.enabled=true", Also, please let me know how reproducable this problem is. Thanks, Bryan ________________________________ From: Martyn Cutcher [mailto:ma...@sy...] Sent: Thursday, July 29, 2010 5:42 AM To: Bryan Thompson Subject: Re: FW: [Bigdata-developers] Errors in bulk load tests (regression) ...initial findings The first stack trace come up with Expecting to write 94659 bytes, but wrote 83950 bytes in 1 >From FileChannelUtility. This is called as a result of the call to flush the writeCache that I added to DiskOnlyStrategy.commit(). I don't believe that this is itself an error even though it may have resulted in this error to become apparent. Examining FileChannelUtility.writeAll, it is possible that this could be thrown if we have a repeat write following AsynchronousCloseException or ClosedChannelException since it is possible that these could be thrown after the buffer has been partly processed (decrementing the bytes remaining). This could be guarded against by resetting the ByteBuffer position before calling continue. ..continuing... - Martyn Bryan Thompson wrote: Martyn, Can you take a look at these errors? Bryan ________________________________ From: Fred Oliver [mailto:fko...@gm...] Sent: Wednesday, July 28, 2010 5:07 PM To: Bigdata Developers Subject: [Bigdata-developers] Errors in bulk load tests (regression) I'm running a test which generates (essentially lubm) data files and invokes MappedRDFDataLoadMaster.main(). The configuration is multiple services running on a single machine. After svn update to the current branch, the test appears to run forever with many exceptions seen in the log file, apparently relating to IO. I've attached two versions of the error.log file. One shows errors from FileChannelUtility, BufferOverflowException, etc. The other shows AssertionErrors being throw between processes. Would one of you take a look, please? Fred |
|
From: Bryan T. <br...@sy...> - 2010-07-29 10:35:11
|
Martyn,
Interesting. You propose that:
> nwritten = channel.write(data, pos + count);
could have written some bytes, and updated data.position(), before an exception was thrown and handled by a retry? It can be a bit difficult to get to the bottom of the FileChannel semantics, but if this is a reasonable suspicion we could always add a counter to track the #of exceptions caught and handled by a reopen/retry of the remaining write.
If this is the case, we might be able to demonstrate the problem by modifying one of the test suites to interrupt readers. I've looked over the various test suites. While AbstractInterruptsTestCase is aimed at the behavior of the system when threads are interrupted, it is not setup as a stress test. However, I think that it would be easy enough to modify AbtractMRMWTestCase to keep a concurrent hash map of the running readers and then introduce random interrupts of reader tasks by writer tasks.
The reason to interrupt the readers is that this will provoke the asychronous close of the backing channel such that writers will have to handle that event in FileChannelUtility.writeAll().
It occurs to me that we could have been masking a problem using the LRUNexus to buffer the disk in the application. The concurrent interrupt can only arise when the read request makes it down to the FileChannel. A hit on the concurrent hash map (LRUNexus) for the store would not read through and hence would reduce the opportunity to observe this error.
One of the things that I changed in the release was to disable the LRUNexus by default. The concurrent record cache was a big performance win while we were synchronized on disk IO operations in DiskOnlyStrategy and WORMStrategy. Now that these classes use a RecurrentReadWriteLock to work around a Sun bug with concurrent IO during a file extension, the LRUNexus no longer offers any performance benefit under heavy concurrent query mixtures so I have disabled it by default to let reads go through to the file system cache. However, this condition (no LRUNexus) has not been tested for scale-out.
This choice (LRUNexus disabled) really needs to be qualified for both performance and correctness on a cluster and we should explore the impact on throughput of reducing the memory allocated to the JVMs when the LRUNexus is disabled since that will leave more memory available to the OS for the file cache.
Reviewing the bigdataCluster{16}.config and bigdataStandalone.config files, I see that the bigdataCluster16.config file has some explicit configuration of the LRUNexus but does not force it to be enabled. The bigdataCluster.config and bigdataStandalone files do not have any explicit configuration of the LRUNexus. Therefore, in all cases it will now be disabled by default for the federation installs. Given this, it is possible that we were masking a problem in FileChannelUtility.writeAll() given semantics for FileChannel.write() which allow the buffer position() to be updated even if the request throws an exception due to an asynchronous file channel close.
To test this, please add the following line to the defaultJavaArgs[] of com.bigdata.jini.start.config.ServiceConfiguration in the bigdata configuration file:
"-Dcom.bigdata.LRUNexus.enabled=true",
Also, please let me know how reproducable this problem is.
Thanks,
Bryan
________________________________
From: Martyn Cutcher [mailto:ma...@sy...]
Sent: Thursday, July 29, 2010 5:42 AM
To: Bryan Thompson
Subject: Re: FW: [Bigdata-developers] Errors in bulk load tests (regression)
...initial findings
The first stack trace come up with
Expecting to write 94659 bytes, but wrote 83950 bytes in 1
>From FileChannelUtility. This is called as a result of the call to flush the writeCache that I added to DiskOnlyStrategy.commit(). I don't believe that this is itself an error even though it may have resulted in this error to become apparent.
Examining FileChannelUtility.writeAll, it is possible that this could be thrown if we have a repeat write following AsynchronousCloseException or ClosedChannelException since it is possible that these could be thrown after the buffer has been partly processed (decrementing the bytes remaining). This could be guarded against by resetting the ByteBuffer position before calling continue.
..continuing...
- Martyn
Bryan Thompson wrote:
Martyn,
Can you take a look at these errors?
Bryan
________________________________
From: Fred Oliver [mailto:fko...@gm...]
Sent: Wednesday, July 28, 2010 5:07 PM
To: Bigdata Developers
Subject: [Bigdata-developers] Errors in bulk load tests (regression)
I'm running a test which generates (essentially lubm) data files and invokes MappedRDFDataLoadMaster.main(). The configuration is multiple services running on a single machine.
After svn update to the current branch, the test appears to run forever with many exceptions seen in the log file, apparently relating to IO. I've attached two versions of the error.log file.
One shows errors from FileChannelUtility, BufferOverflowException, etc.
The other shows AssertionErrors being throw between processes.
Would one of you take a look, please?
Fred
|
|
From: husdon <no...@no...> - 2010-07-28 22:36:25
|
See <http://localhost/job/BigData/changes> |