Re: [opennms-discuss] 1.3.3-1 java.lang.OutOfMemoryError:Javaheapspace

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I've got a few things under test.  As soon as they are done, I will
restart and collect the requested information.
As for the disk I/O, it looks fine.

Paul
=20

|-----Original Message-----
|From: ope...@li...=20
|[mailto:ope...@li...] On=20
|Behalf Of DJ Gregor
|Sent: Sunday, June 24, 2007 10:26 AM
|To: General OpenNMS Discussion
|Subject: Re: [opennms-discuss] 1.3.3-1=20
|java.lang.OutOfMemoryError:Javaheapspace
|
|Paul,
|
|Please see the question in the last paragraph of my original=20
|response to this thread regarding RRD data storage.  Let us=20
|know what you find. =20
|Here's what I wrote before:
|
|> Have you verified that the disks are still keeping up with RRD data=20
|> storage and have you looked at the RRD queue statistics--see my=20
|> previous message to this list for details.
|
|David Hustace has also asked in another email for the full=20
|exception that you are seeing in the logs, but I haven't seen=20
|a response from you on that.
|
|To help you to diagnose this, we need you provide the=20
|information we are asking for.  Other than RRD=20
|queueing-related out of memory errors, we really don't see=20
|OutOfMemory errors regularly, so we need details on what is=20
|happening to try to figure out the cause.
|
|
|        - djg
|
|On Sun, 24 Jun 2007 05:46:02 -0700, "Paul Mona" <pm...@co...>
|said:
|>=20
|>=20
|> This was happening every 4 hours with linkd enabled.  Since=20
|disabling=20
|> linkd, 14 hours have past with out an error.
|>=20
|>=20
|>=20
|> =20
|>=20
|> |-----Original Message-----
|> |From: ope...@li...
|> |[mailto:ope...@li...] On Behalf Of=20
|> |DJ Gregor
|> |Sent: Saturday, June 23, 2007 9:48 AM
|> |To: General OpenNMS Discussion
|> |Subject: Re: [opennms-discuss] 1.3.3-1
|> |java.lang.OutOfMemoryError: Java heapspace
|> |
|> |On Sat, 23 Jun 2007 06:46:59 -0700, "Paul Mona"=20
|> |<pm...@co...>
|> |said:
|> |> While running the 1.3.3-1 release, I've seen a number of=20
|> |> "OutOfMemeoryError" exceptions happen.  This was also very=20
|> |> prevalent in previous releases.  To avoid this, we have deployed
|> |multiple disks
|> |> in our boxes with a raid0 array dedicated solely to writing
|> |rrd data. =20
|> |> But still the problem occurs.
|> |> =20
|> |> In a past thread, David wrote:
|> |> [ additional quoting added ]
|> |> > This I suspect to be the problem.  I've recently seen this at=20
|> |> > another site.  We have a poor implementation of of a=20
|call used in=20
|> |> > Hibernate to
|> |>>  load up all a node's data in the collector because we
|> |currently have
|> |>> no
|> |> > transaction boundary in the collector code (I know, lot of
|> |mumbo jumbo).
|> |> > So when a node has a *lot* of interfaces like these, then there=20
|> |> > is the potential for an extraordinary amount of memory to be
|> |used when,
|> |> > if implemented correctly, we wouldn't have this problem.  We're=20
|> |> > going to have to address it before we release 1.3.3.
|> |> =20
|> |> Does anyone know if this has been addressed?
|> |
|> |"PostgreSQL JDBC driver runs out of memory when=20
|> |NodeDaoHibernate.getHierarchy is called on a node with many=20
|interfaces"
|> |http://bugzilla.opennms.org//show_bug.cgi?id=3D1888
|> |
|> |A patch has been applied and is in 1.3.3 that eliminates=20
|the complex=20
|> |query that returns a very large number of rows.
|> |
|> |If you were running into bug #1888, you would see that the=20
|JVM would=20
|> |temporarily run out of memory when it was calling=20
|> |NodeDaoHibernate.getHierarchy on nodes with a large number of=20
|> |interfaces.  I think we were seeing this in case where a node has=20
|> |hundreds of interfaces.  The memory would end up getting freed once=20
|> |the NodeDaoHibernate.getHierarchy call failed, and OpenNMS would=20
|> |generally continue to run okay (I believe).  You would see things=20
|> |like this in the logs (collectd.log, I think):
|> |
|> |    org.postgresql.util.PSQLException: Ran out of memory retrieving
|> |    query results.
|> |    org.opennms.netmgt.dao.hibernate.NodeDaoHibernate.getHierarchy
|> |
|> |The latter line would be part of an exception stack trace, usually=20
|> |from the stack trace of the PSQLException shown above.
|> |
|> |If you aren't getting those errors, they you aren't running=20
|into this=20
|> |problem, and since the problem you are seeing appears to be=20
|> |permanent, and not temporary, I would suspect something else.
|> |
|> |Have you verified that the disks are still keeping up with RRD data=20
|> |storage and have you looked at the RRD queue statistics--see my=20
|> |previous message to this list for details.
|
|---------------------------------------------------------------
|----------
|This SF.net email is sponsored by DB2 Express Download DB2=20
|Express C - the FREE version of DB2 express and take control=20
|of your XML. No limits. Just data. Click to get it now.
|http://sourceforge.net/powerbar/db2/
|_______________________________________________
|Please read the OpenNMS Mailing List FAQ:
|http://www.opennms.org/index.php/Mailing_List_FAQ
|
|opennms-discuss mailing list
|
|To *unsubscribe* or change your subscription options, see the=20
|bottom of this page:
|https://lists.sourceforge.net/lists/listinfo/opennms-discuss
|

Re: [opennms-discuss] 1.3.3-1 java.lang.OutOfMemoryError:Javaheapspace

A Java based fault and performance management system

Re: [opennms-discuss] 1.3.3-1 java.lang.OutOfMemoryError:Javaheapspace