I saw a cache server being exercised by TestCacheRandom
(-threads 3 -pks 1000 -quiet) run out of memory trying
to create FlushWorker threads. I checked, and when the
cache decides to perform a flush in the background, it
creates a new thread to do so unless there's a
currently idle FlushWorker thread. See:
- CacheS::FlushMPKFile and CacheS::NewFlushWorker in
the block=false case
- CacheS::AddEntryToMPKFile which passes block=false to
- FlushWorker::FlushWorker with always starts a new thread
The upper bound should be one per MultiPKFile (2^16),
but that's still way too many. We should instead limit
the number of such threads (perhaps to the established
[CacheServer]FlushWorkerCnt setting) and queue work for
them to do.
It's a little odd that we haven't seen this before.
However, I only saw this with a cache built with a
modified LimService implementation working on this
issue titled "LimService uses unbounded number of threads":
I think my new implementation may be more fairly taking
turns among active connections, and at least it is
using a totally different mechanism to decide which
active connection gets to perform an RPC next. I
imagine this has something to do with why I suddenly
observed so many FlushWorker threads.
Log in to post a comment.