When running an application that causes GC to run out of memory I
am observing a behavior that I am not sure is entirely correct. I am
using a FullAdaptiveSemiSpace configuration on RedHat 9.0 (Intel)
on a machine with 2GB phisical memory (I set the minimum heap
size to 1.8GB: -Xms1800M ). The VM is (unfortunately) modified,
but not in a way that should disturb handling of the out-of-memory
error.
When GC realize that it is running out of memory, it issues the
appropriate warning:
GC Warning: Possible VM range imbalance -
Allocator.allocSlowBody failed on request of 56 on space ss1
Then it attempts to perform a GC and actually succeeds:
[GC 2 Start 1543.71 s 736632KB -> 42100 KB 336.06 ms]
It is however still throwing an OutOfMemoryError:
Exception in thread "Thread-208": Throwable.printStackTrace(): We
are trying to dump the stack of an OutOfMemoryError.
Throwable.printStackTrace(): We'll use the VM.sysWriteln()
function,
Throwable.printStackTrace(): instead of the System.err stream, to
avoid trouble.
java.lang.OutOfMemoryError
....
The machine I am using to run this application is an SMP and
multiple threads may access the allocator at the same time. Is it
possible, that while an attempt is taken to handle the out-of-memory
error, some other thread on a differen processor tries to allocate an
object and thus disturbs the entire process?
Best regards
Adam Welc
Logged In: YES
user_id=1253816
Hi Adam,
Since the failing request is for a small object (56 bytes), the cause is
definitely not a virtual address
range imbalance. That usually results from allocating many large objects
instead. When a GC request fails,
a GC is triggered and the allocation request is retried (up to 3 times) before
memory is considered
exhausted. The problem is that in the presence of many threads (perhaps
hundreds - you seem to be up to
200), this mechanism might be too sensitive and not realize that progress
is being made (albeit on other
threads). One easy thing you can do is to increase the magic value 3.
The right thing to do is to detect
global progress (by other threads) and to retry indefinitely. This bug
exhibits the general lack of
communication between the GC module and the scheduler (or at least per-
thread allocation statistics).
Perry
Logged In: YES
user_id=1215435
Perry or Steve, could one of you either close the bug or
work with Adam to resolve?
Logged In: YES
user_id=1253816
I might have found the problem. In the Allocator class a thread makes
several (5 at the moment) attempts to perform allocation on the slow path
(in the allocSlowBody method) and fails with an error message if none of
the attempts is successfull.
It seems that the crashes I was observing occur when two threads enter
this method in parallel. The first thread sets the flag indicating that GC is
in progress. This causes calls to the poll() method to return NIL and as a
result all attempts of the other thread to acquire additional pages fail (even
though an attempt to acquire additional pages on the slow path is
synchronized - allocPages() method in MonotonePageResource class).
I have a fix that works for me (no more crashes), but it's really more of a
hack. I inserted the following code fragment at the end of the for loop in
the allocSlowBody() method (Allocator class) - isCollectionInitiated()
method returns true if collectionsInitiated flag is greater than 0, and false if
it equal to 0:
This causes a thread that attempts to acquire pages once a collection is
initiated to yield, allow the collection to finish and only then re-attemting
the acquire.
Best regards
Adam
Logged In: YES
user_id=1215435
Let's try to get Adam's fix (or something equivalent) in
before next release.
Logged In: YES
user_id=308843
Originator: NO
This looks as though it's fixed, can we close the tracker?
Logged In: YES
user_id=308843
Originator: NO
Closing as Daniel made many improvements in this area and this shouldn't be an issue any more. Re-open if it is.