Marcus:

 

>The General Question is, do I have to worry ? ;-)

In general, yes. :-) Now, in this _specific_ case…  :-)

 

>Is there any hint of a memory leak or was it just bad circumstances ?

I don’t see anything here that points to a leak. What you show seems to be consistent with a large max memory set compared to what is needed, and the GC never really bothers to do the deepest scan possible.

 

Garbage collection is lazy, and in most cases the JVM will use all the memory you give it before doing the “deepest” GC scan. This is why, for example, is not recommended for UI type programs to allocate say 4 GB when 100MB is enough: it _will_ use the full 4 GB (even if it has no real use for it) and _then_ issue a deep GC that will freeze the whole UI to scan the whole memory… And with a heap that big it may take a couple of seconds. Things may have changed with Java 8, but the behavior is not uncommon.

 

This trend:

https://github.com/eveCSS/profiling/blob/master/13250-heap.png
can be perfectly normal. The fastest sawtooth is a fast GC cycle limited to the first generation (objects that were just allocated). Then you see a slower sawtooth, with a deeper GC cycle. Then you have the even slower cycle (which you just one go), with the deepest GC cycle.

 

The point is: just looking at the memory usage is not a good indicator.

 

The most reliable indicator is comparing memory dumps: when that is done, unreachable objects are not included in the dump, so you know exactly what memory is actually used. Another way is to manually perform the GC: that’s why you have a button there. You run the program for a while, perform GC, let it run again, perform GC again, and compare the two values after you performed GC. That way you are (more or less) comparing actual memory usage. I say more or less because there is no guarantee the GC will actually clean _all_. It may stop when it thinks it’s “good enough”.

 

Note that for something like CS-Studio, where the memory usage changes drastically from run to run, this is a general problem with no solution: if you set the memory low, you may get a good memory footprint but the minute you open a large waveform (e.g. an image) you crash; if you set it high, you can use large waveforms, but if you don’t you are going essentially to be wasting memory. This is even worse if you are trying to run in a shared environment (where each user will have his own instance).

 

Gabriele

 

From: marcus.michalsky@ptb.de [mailto:marcus.michalsky@ptb.de]
Sent: Friday, June 13, 2014 12:57 PM
To: Kasemir, Kay
Cc: <cs-studio-users@lists.sourceforge.net>
Subject: Re: [Cs-studio-users] Filling Memory when observing PVs

 

Hi,

the memory sizes were from the linux tool top and the heap dump MAT (Memory Analyzer Tools) of Eclipse as stated:

- Memory consumption according to "top" was around...
- Heap dumps of these processes showed... (MAT)
- "top" memory consumption:

Did I miss to declare it anywhere ?

In the first test with OpenJDK I observed an increasing memory consumption (top) measured in percentage every ten minutes:
https://github.com/eveCSS/profiling/blob/master/4356.png

After four days MAT showed the zip stuff in the top components:
https://github.com/eveCSS/profiling/blob/master/Finalizer-4356.png

There were thousands of InputStreams (with closeRequested true) and referent=Finalizer:
https://github.com/eveCSS/profiling/blob/master/Finalizer.png

With the Oracle JDK the behavior was a bit better but not much (few less Finalizer objects...)


I checked the hint with the self-updates. Since in our custom build version there is no update plugin this could not be the source.


I repeated the experiment (although not as long as the first one which was ca. five days):

With the OpenJDK I observed again a rising sawtooth of the heap memory:
https://github.com/eveCSS/profiling/blob/master/19081-heap.png

Again memory consumption (top) measured in percentage every ten minutes:
https://github.com/eveCSS/profiling/blob/master/19081.png

And with Oracle JDK:
The heap lowered after a while starting again to rise:
https://github.com/eveCSS/profiling/blob/master/13250-heap.png

Memory Consumption according to top measured in percentage every ten minutes:
https://github.com/eveCSS/profiling/blob/master/13250.png

The Oracle JDK slightly "recovers" by freeing some memory after a long time,
maybe the OpenJDK also will do it at a later time ? In the first run it did not...

The second run was not as long as the first so maybe it was too early to observe, but I didn't see any "Live Objects" of
the suspicious zip streams (except ca. 50 responsible for plugin jar loading).

So 2nd time I got the same behavior as you. What was the difference to the first run ?
Are there side effects which delay the Finalization of objects, esp. zip input streams, so long
that they reside in memory (as mentioned in my first email -> Finalizers should generally be avoided) ?
Maybe it is not connected to PVs itself but the amount of PVs updating themselves is (sometimes)
so much "traffic" that the Finalizer thread does not run because of its low priority or because it is blocked ?

The General Question is, do I have to worry ? ;-)
Is there any hint of a memory leak or was it just bad circumstances ?

Since I did not observe such many zip input streams I did not attach a memory dump (which also would be too large in many cases).

Thanks
Marcus




Von:        "Kasemir, Kay" <kasemirk@ornl.gov>
An:        "marcus.michalsky@ptb.de" <marcus.michalsky@ptb.de>
Kopie:        "<cs-studio-users@lists.sourceforge.net>"        <cs-studio-users@lists.sourceforge.net>
Datum:        06.06.2014 20:19
Betreff:        Re: [Cs-studio-users] Filling Memory when observing PVs





Hi:

In your description, I'm not sure which memory sizes refer to the overall memory usage as reported by ps, top, … Linux tools, and which are from jvisualvm, jprofiler, or even the plain CSS Monitor view, i.e. the JVM's idea of its memory usage.

Do I understand correctly that with the Oracle JDK, the memory consumption is generally a sawtooth?
Then I wouldn't worry about for example going from 3% to 11% after 40 hours, that could just mean: It settles in to 11%, that's what you see in "top", and within that overall memory size, the JVM goes up and down in a sawtooth manner.

Completely running out of memory as you see with OpenJDK  is of course a problem.
If you can avoid that by using the Oracle JDK, then at least you have a workaround.

You suggest that java.util.zip.* objects are using much of that memory.
I don't see how CSS per se, tools like the PV Table would continually open ZIP files.

ZIP files are used because Eclipse loads plugins, which are mostly *.jar, i.e. zip files.
All that should happen when you start CSS, and whenever you for example open a PV table for the first time, so it loads the PV table plugin and dependencies. Still, I don't see that continuing all the time.
When I check CSS running here, displaying *.opi files for Instruments, I see about 5000 java.util.zip.ZipFile$ZipFileInflaterInputStream instances, so it's indeed thousands as you found in your tests, but they are all removed after about a minute.

The other case could be self-updates.
When you go to Preferences, Install/Update, Automatic Updates, is that configured to periodically look for updates, which could mean that it connects to update sites, downloads the content.jar, which is a ZIP?

-Kay

On Jun 6, 2014, at 12:30 PM, marcus.michalsky@ptb.de wrote:
> Hi,
>
> I am experiencing "memory problems" observing PVs over a long timespan. I've done the following:
>
> Started
>
> * custom CSS product based on 3.2.15
> * NSLS-II 3.2.16a (downloaded from website)
> * SNS 3.2.15 (downloaded from website)
>
> Two pre-build products:
> * with a clean workspace
> * defaults as in the ini file
> * opened pv-table and added 20 "real" PVs (multiple current doubles) and 6 "sim" PVs
>
> Custom Build:
> * same epics config
> * custom table with same PVs connected updating table cells (at 4Hz each)
>
> OS: openSUSE 12.3 (x86_64)
> Kernel: 3.7.10-1.28-desktop
> VM: 1.7.0_51 OpenJDK 64-bit (later also tested with Oracle JDK, see below)
>
> The behaviour observed was that after nearly five days the memory (physical and swap) was almost full.
> Memory consumption according to "top" was around 30% each (Resident Memory > 2GB). Heap dumps
> of these processes showed "java.lang.ref.Finalizer" occupied > 50% (>16 MB) of the Heap. Finalizer
> statistics (Top Components->system class loader) revealed several objects of the following classes:
>
> * java.util.zip.ZipFile$ZipFileInflaterInputStream (thousands)
> * java.util.zip.ZipFile$ZipFileInputStream (thousands)
> * java.util.zip.ZipFile (hundreds)
> * java.util.zip.Inflater (hundreds)
> * java.net.SocksSocketImpl (varying hugely between applications)
>
> I have found some hints why this behavior might occur:
>
> Finalizers should generally be avoided (see Effective Java 2nd Edition, Bloch Item 7: "Avoid Finalizers"):
> Finalizer threads have low priority. If they cannot keep up with the rate of arriving objects (from threads
> with higher priority) queues grow -> heap fills.
> There's also a problem with allocated native memory which is not freed during finalization ? And if finalize
> fais/exits with errors it is not called again...
>
> The question is where does these zip objects come from. The Profiler showed several eclipse plugins
> (which are extracted from jars) as well as several SocksSocketImpl...
>
> Interesting Bug Report:
>
http://bugs.java.com/view_bug.do?bug_id=4797189
>
> ->CHANGING VM
>
> Due to these observations I decided to repeat the experiment but switching the used Java VM = Oracle JDK 1.7.0u60.
> Observations were a little different:
>
> "top" memory consumption:
> custom build: started with 3.0%, was 11.0% after 40h
> NSLS-II: started with 3.0%, was 3.8% after 40h
> SNS: started with 3.2%, was 13.5% after 40h
>
> I also observed that with this setup (Oracle VM) the heap size looked more like a sawtooth. When gc was active the
> consumption temporarily lowered a bit. Immediately after these events heap dumps temporarily looked better
> (less zip stuff). So I assume the effect is slower because gc on Oracle is more aggresive ?
>
> Another metric I observed was the "surviving generations". The 2 hours I observed them, they increased continually.
> When triggering GC manually they lowered a bit. A Netbeans Profiler article states that this metric should stabilize
> after a while:
>
>
https://netbeans.org/kb/articles/nb-profiler-uncoveringleaks_pt1.html
>
> Maybe it has something to do with the native linux zip libraries used ?
>
> Does anyone have experienced similar behavior ?  Or any hints ?
>
> Thanks
> Marcus------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
>
http://p.sf.net/sfu/NeoTech_______________________________________________
> Cs-studio-users mailing list
> Cs-studio-users@lists.sourceforge.net
>
https://lists.sourceforge.net/lists/listinfo/cs-studio-users