1. Summary
  2. Files
  3. Support
  4. Report Spam
  5. Create account
  6. Log in

Ticket #437 (closed defect: fixed)

Opened 17 months ago

Last modified 16 months ago

Thread-local cache combined with unbounded thread pools causes effective memory leak

Reported by: thompsonbry Owned by: thompsonbry
Priority: critical Milestone: Query
Component: Bigdata RDF Database Version: BIGDATA_RELEASE_1_1_0
Keywords: Cc: martyncutcher, mrpersonick

Description (last modified by thompsonbry) (diff)

Quite some time ago we introduced classes for batched updates against a hard reference queue (ring buffer data structure) and concurrent hash map with weak values. The relevant classes are:

1. HardReferenceQueueWithBatchingUpdates?
2. ConcurrentWeakValueCacheWithBatchedUpdates?

At the time, we considered two designs. One using striped locks and one using thread locals. The thread local design had better throughput and has been in place for quite some time. However, the thread local design is turning into a memory leak. The purpose of this issue is to correct that memory leak.

The ConcurrentWeakValueCacheWithBatchedUpdates? is only used by the LexiconRelation?'s termCache. This is the source of the memory leak. The HardReferenceQueueWithBatchingUpdates? is used to back up the termCache. The HardReferenceQueueWithBatchingUpdates? is also used for the writeRetentionQueue on the B+Tree and HTree, but I have not seen evidence of a memory leak in those cases.

The problem can be corrected in either of two ways:

1. Implement a striped lock version of the IHardReferenceQueue interface and use it in place of the thread-local version for the termCache.

2. Use a fixed size thread pool for all operations which touch the termCache.

Either approach will eliminate the memory leak.

This issue is related to [1], and in fact would appear to be at least one of the root causes of [1] (there may be other causes, but this one has been observed).

This problem exists against the 1.1.x release and is also doubtless present in the 1.0.x release, though I have not verified that yet. The problem only appears when there is a heavy sustained concurrent query workload and can be masked on machines with large JVM heaps.

[1] https://sourceforge.net/apps/trac/bigdata/ticket/433 (Cluster leaks threads under read-only index operations)

Change History

Changed 17 months ago by thompsonbry

  • status changed from new to accepted
  • description modified (diff)

Changed 17 months ago by thompsonbry

I found the striped lock version of the code. The BCHMGlobalLRU2 class has a TLB (thread local buffer) inner class which is capable of being deployed either using striped locks or true thread local buffers. The BCHMGlobalLRU2 class itself is parametrized such that it can be deployed in either manner. It seems that the HardReferenceQueueWithBatchingUpdates? class was never written with this flexibility, but perhaps we can migrate the TLB class into it.

I have added support for both pure thread-local and striped locks to the HardReferenceQueueWithBatchingUpdates? class. The class still defaults to pure thread-local buffers. I want to test the performance impact on concurrent query next (BSBM, LUBM).

Committed revision r5824.

Changed 17 months ago by thompsonbry

  • cc mrpersonick added
  • version changed from TERMS_REFACTOR_BRANCH to BIGDATA_RELEASE_1_1_0

It turns out that the memory leak that I have been tracking in the termsCache is only partly related to the use of thread local buffers. In fact, what is pinning the BigdataValues? in the termsCache is the strong reference from AbstractIV#cache.

Mike and I have discussed this. It looks like the easiest way to fix this is to clone the Key for the termCache (an IV) and clear the hard reference from the Key to the Value (the AbstractIV#cache field) on the cloned IV. The cloned IV would then be put into the map (this is actually a putIfAbsent pattern).

It might be worth while to change the putIfAbsent() calls against the termCache to first test for the existence of the IV in the cache in order to avoid creating a copy of the IV if it exists in the termCache.

The termCache is used in LexiconRelation?, where we write on it in 4 places. It is also passed into the methods which do batch add/resolve for TermIds? and BlobIVs. It might be useful to encapsulate the termCache with a simpler interface to make it easier to track down the places where we resolve/putIfAbsent on that map.

A clone(boolean clearCache) method can be added to IVCache. It should be implemented for each concrete type of IV, which will make it very efficient to create these "clones" (no reflection looking for a constructor). While we are at it, the dropValue() method on that interface should also be removed from the code base.

This issue is also present in the 1.0.x release since that release includes the IV cache pattern.

Changed 17 months ago by thompsonbry

Removed IVCache#dropValue() from the code base. It was unused and implies a pattern which can not be made to work.

Added IVCache#clone(boolean). This is in support of [1]. The AbstractIV#cache field was providing a hard reference from the key of the termCache weak value map to the value of that map. That caused the values of the map to become pinned and is a root cause memory leak. This was identified based on a heap dump after running the BSBM explore use case ~ 80 times (until the memory leak began to degrade the ability of the OS to buffer the disk as evidenced by increasing IO Wait).

Modified the IV test suites per above.

Abstracted the termsCache using an ITermCache interface and flyweight TermCache? wrapper class which delegates the necessary methods through to the backing concurrent weak value cache impl.

[1] https://sourceforge.net/apps/trac/bigdata/ticket/437 (Thread-local cache combined with unbounded thread pools causes effective memory leak)

Committed Revision r5829.

Changed 17 months ago by thompsonbry

The change set above appears to have resolved this issue for the 1.1.0 branch. The throughput is steady at ~ 40k QMpH.

There was one very odd blip with low throughput. It would seem that some background process must have stolen a lot of CPU time - or perhaps there was some odd change in the JVM dynamic. That trial scored only 1375 QMpH. Odd.

The JVM heap is steady at 2.7G. IO Wait is steady at 1%. Blocks in is steady at ~ 4500. CPU utilization is steady at 67%. The 40k QMpH number and those load metrics are all steady for a one hour period. This is significantly more than 80 trials without any slowdown.

- It would be interesting to compare the performance with thread local buffers enabled.

- We should retry the cluster and see if the thread / memory leak / DGC problem has also been resolved.

Changed 17 months ago by thompsonbry

Ah. There were some error messages in the server log. It appears that the "blip" was caused by a GC Overhead Exceeded error. The JVM recovered quite nicely after that one trial, so this would appear to be down to the JVM dynamics. It would be interesting to study this failure mode in more depth.

Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
     [java]     at java.util.LinkedList.<init>(LinkedList.java:78)
     [java]     at com.bigdata.bop.bindingSet.ListBindingSet.copy(ListBindingSet.java:286)
     [java]     at com.bigdata.bop.bindingSet.ListBindingSet.<init>(ListBindingSet.java:267)
     [java]     at com.bigdata.bop.bindingSet.ListBindingSet.copy(ListBindingSet.java:332)
     [java]     at com.bigdata.bop.solutions.ProjectionOp$ChunkTask.call(ProjectionOp.java:149)
     [java]     at com.bigdata.bop.solutions.ProjectionOp$ChunkTask.call(ProjectionOp.java:106)
     [java]     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
     [java]     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
     [java]     at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkTask.call(ChunkedRunningQuery.java:1196)

Changed 17 months ago by thompsonbry

The old generation heap for the process noted above was only 137G after a full GC was forced using yourkit. This clearly indicates that there is no longer a memory leak. I have restarted the benchmark against the same JVM so I can observe the heap dynamics.

Attaching yourkit to the JVM has driven up the JVM heap. I've observed this before. It would appear to be an artifact of the additional memory requirements to buffer data for yourkit and the slowdown of the JVM running the NanoSparqlServer?. Less throughput translates into more heap requirements.

The heap utilization appears to be constant, even though it is larger with the profiler attached.

No hot spots were observed using the profiler.

Changed 17 months ago by thompsonbry

The changes described above have resolved the memory leak in the termCache.

A memory leak / thread leak is still observed on the cluster. The leak appears to be tied to DGC [1]. It can be observed by running the BSBM benchmark against the cluster. This leak appears to have a separate cause.

I am also observing too much memory utilization for index shards. The likely cure for that is to reduce the writeRetentionQueue for the indices in the cluster. This makes sense because we wind up with 100s or 1000s of shards per node in the cluster, and each shard holds much less data than on the single machine deployments. Also, on the cluster the index segment leaf is at most one IO if the index segment is open and only 2-3 IOs if the index segment is not open. Therefore the default writeRetentionQueue capacity of 500 (or less) is probably appropriate to the cluster. It would help to look at the actual #of nodes and leaves in a full shard to make this determination.

[1] https://sourceforge.net/apps/trac/bigdata/ticket/433

Changed 17 months ago by thompsonbry

Next step is to declare a sync RPC API for fast range counts. That is the likely source of most of the Futures. The RangeCountProcedure? is being submitted by the ClientIndexView?. If we define a purpose specific method for this on IDataService then we should be able to reduce the exported objects considerably. Make sure to use an interface for the parameter as part of a movement towards a continuously evolvable deployment.

Changed 17 months ago by thompsonbry

  • status changed from accepted to closed
  • resolution set to fixed

I have implemented a workaround which addresses the problem. I added a ThickFuture? class. The constructor waits until the Future is done and then records the outcome, plus whether there was an exception/interrupt (cancelled). Using this for just the fast range counts is sufficient for the single threaded BSBM harness to run. Doing this for all requests is sufficient to run 8 concurrent query clients with no increase in the thread count on the DS. I am now running a sustained BSBM query mix against the cluster.

Committed Revision r5831

Changed 16 months ago by thompsonbry

  • status changed from closed to reopened
  • resolution fixed deleted

Ported r5824 and r5828 to the 1.0.x maintenance branch. These change
sets include the support for striped locks in the batched hard
reference queue backing the concurrent hash map used by the termCache.

Note: The TermId?.cache mechanism also needs to be defeated. This was
done in r5829 in the BIGDATA_RELEASE_1_1_0 maintenance branch.
However, the IV class hierarchy is significantly different for 1.1.x
so this change set needs to be reimplemented for 1.0.x rather than
simply imported. An ITermCache, TermCache?, and IV.clone(boolean) were
all introduced to defeat this memory leak.

Committed revision r5836 (1.0.x maintenance branch)

Changed 16 months ago by thompsonbry

  • status changed from reopened to closed
  • resolution set to fixed

Port of r5829 to the 1.0.x maintenance branch.

Encapsulate terms cache with interface and ensure IV.cache reference
is cleared when adding to the map.

- ITermCache
- TermCache?
- LexiconRelation#termCache?, #termCacheFactory

All IVs must implement clone(boolean).

- TestIVCache
- IV.dropCache(), IV.clone(boolean)
- AbstractIV#setValue() => final
- AbstractIV#dropCache() => removed.
- AbstractIV#flags => protected.
- TermId#clone?(boolean).
- TermId#setValue?(...) => removed (overrode AbstractIV but added nothing)
- NullIV#clone(boolean) returns self.
- DummyIV#clone(boolean) returns self.
- TestEncodeDecodeKeys?: implemented clone(boolean) in private IV impl.
- TestEncodeDecodeKeys?: imported logic to test clone(boolean).
- ExtensionIV#clone(boolean)
- SidIV#clone(boolean) & removed unused logger.
- NumericBNodeIV#clone(boolean)
- UUIDBNodeIV#clone(boolean)
- UUIDLiteralIV#clone(boolean)
- XSDBooleanIV#clone(boolean)
- XSDByteIV#clone(boolean)
- WrappedIV#clone(boolean): <<<=== Review with MikeP.

Note: No changes were required in order to encapsulate access to the
termsCache using ITermCache. While it is used by various index
procedures in the 1.0.x maintenance branch, all of those references
are from inner classes which are directly accessing the termsCache
reference on the LexiconRelation?. (This was changed in the 1.1.x
branch, which now has static helper classes for those tasks.)

Committed revision r5837 (1.0.x maintenance branch)

This issue is now resolved against both the 1.0.x and 1.1.x maintenance branches.

Note: See TracTickets for help on using tickets.