Menu

#476 GC out-of-memory fallback problem

confirmed
closed
nobody
GC (93)
6
2012-09-21
2005-04-06
corwin
No

When running an application that causes GC to run out of memory I
am observing a behavior that I am not sure is entirely correct. I am
using a FullAdaptiveSemiSpace configuration on RedHat 9.0 (Intel)
on a machine with 2GB phisical memory (I set the minimum heap
size to 1.8GB: -Xms1800M ). The VM is (unfortunately) modified,
but not in a way that should disturb handling of the out-of-memory
error.

When GC realize that it is running out of memory, it issues the
appropriate warning:

GC Warning: Possible VM range imbalance -
Allocator.allocSlowBody failed on request of 56 on space ss1

Then it attempts to perform a GC and actually succeeds:

[GC 2 Start 1543.71 s 736632KB -> 42100 KB 336.06 ms]

It is however still throwing an OutOfMemoryError:

Exception in thread "Thread-208": Throwable.printStackTrace(): We
are trying to dump the stack of an OutOfMemoryError.
Throwable.printStackTrace(): We'll use the VM.sysWriteln()
function,
Throwable.printStackTrace(): instead of the System.err stream, to
avoid trouble.
java.lang.OutOfMemoryError
....

The machine I am using to run this application is an SMP and
multiple threads may access the allocator at the same time. Is it
possible, that while an attempt is taken to handle the out-of-memory
error, some other thread on a differen processor tries to allocate an
object and thus disturbs the entire process?

Best regards

Adam Welc

Discussion

  • corwin

    corwin - 2005-04-06

    Logged In: YES
    user_id=1253816

    Hi Adam,

    Since the failing request is for a small object (56 bytes), the cause is
    definitely not a virtual address
    range imbalance. That usually results from allocating many large objects
    instead. When a GC request fails,
    a GC is triggered and the allocation request is retried (up to 3 times) before
    memory is considered
    exhausted. The problem is that in the presence of many threads (perhaps
    hundreds - you seem to be up to
    200), this mechanism might be too sensitive and not realize that progress
    is being made (albeit on other
    threads). One easy thing you can do is to increase the magic value 3.
    The right thing to do is to detect
    global progress (by other threads) and to retry indefinitely. This bug
    exhibits the general lack of
    communication between the GC module and the scheduler (or at least per-
    thread allocation statistics).

    Perry

     
  • Dave Grove

    Dave Grove - 2005-04-29

    Logged In: YES
    user_id=1215435

    Perry or Steve, could one of you either close the bug or
    work with Adam to resolve?

     
  • corwin

    corwin - 2005-05-01

    Logged In: YES
    user_id=1253816

    I might have found the problem. In the Allocator class a thread makes
    several (5 at the moment) attempts to perform allocation on the slow path
    (in the allocSlowBody method) and fails with an error message if none of
    the attempts is successfull.

    It seems that the crashes I was observing occur when two threads enter
    this method in parallel. The first thread sets the flag indicating that GC is
    in progress. This causes calls to the poll() method to return NIL and as a
    result all attempts of the other thread to acquire additional pages fail (even
    though an attempt to acquire additional pages on the slow path is
    synchronized - allocPages() method in MonotonePageResource class).

    I have a fix that works for me (no more crashes), but it's really more of a
    hack. I inserted the following code fragment at the end of the for loop in
    the allocSlowBody() method (Allocator class) - isCollectionInitiated()
    method returns true if collectionsInitiated flag is greater than 0, and false if
    it equal to 0:

      if (Plan.getInstance().isCollectionInitiated()) {
        Thread.yield();
        i--;
      }
    

    This causes a thread that attempts to acquire pages once a collection is
    initiated to yield, allow the collection to finish and only then re-attemting
    the acquire.

    Best regards

    Adam

     
  • Dave Grove

    Dave Grove - 2005-05-09

    Logged In: YES
    user_id=1215435

    Let's try to get Adam's fix (or something equivalent) in
    before next release.

     
  • Ian Rogers

    Ian Rogers - 2007-07-20

    Logged In: YES
    user_id=308843
    Originator: NO

    This looks as though it's fixed, can we close the tracker?

     
  • Ian Rogers

    Ian Rogers - 2007-09-19

    Logged In: YES
    user_id=308843
    Originator: NO

    Closing as Daniel made many improvements in this area and this shouldn't be an issue any more. Re-open if it is.

     

Log in to post a comment.