Ticket #437 (closed defect: fixed)
Thread-local cache combined with unbounded thread pools causes effective memory leak
| Reported by: | thompsonbry | Owned by: | thompsonbry |
|---|---|---|---|
| Priority: | critical | Milestone: | Query |
| Component: | Bigdata RDF Database | Version: | BIGDATA_RELEASE_1_1_0 |
| Keywords: | Cc: | martyncutcher, mrpersonick |
Description (last modified by thompsonbry) (diff)
Quite some time ago we introduced classes for batched updates against a hard reference queue (ring buffer data structure) and concurrent hash map with weak values. The relevant classes are:
1. HardReferenceQueueWithBatchingUpdates?
2. ConcurrentWeakValueCacheWithBatchedUpdates?
At the time, we considered two designs. One using striped locks and one using thread locals. The thread local design had better throughput and has been in place for quite some time. However, the thread local design is turning into a memory leak. The purpose of this issue is to correct that memory leak.
The ConcurrentWeakValueCacheWithBatchedUpdates? is only used by the LexiconRelation?'s termCache. This is the source of the memory leak. The HardReferenceQueueWithBatchingUpdates? is used to back up the termCache. The HardReferenceQueueWithBatchingUpdates? is also used for the writeRetentionQueue on the B+Tree and HTree, but I have not seen evidence of a memory leak in those cases.
The problem can be corrected in either of two ways:
1. Implement a striped lock version of the IHardReferenceQueue interface and use it in place of the thread-local version for the termCache.
2. Use a fixed size thread pool for all operations which touch the termCache.
Either approach will eliminate the memory leak.
This issue is related to [1], and in fact would appear to be at least one of the root causes of [1] (there may be other causes, but this one has been observed).
This problem exists against the 1.1.x release and is also doubtless present in the 1.0.x release, though I have not verified that yet. The problem only appears when there is a heavy sustained concurrent query workload and can be masked on machines with large JVM heaps.
[1] https://sourceforge.net/apps/trac/bigdata/ticket/433 (Cluster leaks threads under read-only index operations)