|
From: Paul M. <pm...@co...> - 2007-06-24 20:36:32
|
I've got a few things under test. As soon as they are done, I will restart and collect the requested information. As for the disk I/O, it looks fine. Paul =20 |-----Original Message----- |From: ope...@li...=20 |[mailto:ope...@li...] On=20 |Behalf Of DJ Gregor |Sent: Sunday, June 24, 2007 10:26 AM |To: General OpenNMS Discussion |Subject: Re: [opennms-discuss] 1.3.3-1=20 |java.lang.OutOfMemoryError:Javaheapspace | |Paul, | |Please see the question in the last paragraph of my original=20 |response to this thread regarding RRD data storage. Let us=20 |know what you find. =20 |Here's what I wrote before: | |> Have you verified that the disks are still keeping up with RRD data=20 |> storage and have you looked at the RRD queue statistics--see my=20 |> previous message to this list for details. | |David Hustace has also asked in another email for the full=20 |exception that you are seeing in the logs, but I haven't seen=20 |a response from you on that. | |To help you to diagnose this, we need you provide the=20 |information we are asking for. Other than RRD=20 |queueing-related out of memory errors, we really don't see=20 |OutOfMemory errors regularly, so we need details on what is=20 |happening to try to figure out the cause. | | | - djg | |On Sun, 24 Jun 2007 05:46:02 -0700, "Paul Mona" <pm...@co...> |said: |>=20 |>=20 |> This was happening every 4 hours with linkd enabled. Since=20 |disabling=20 |> linkd, 14 hours have past with out an error. |>=20 |>=20 |>=20 |> =20 |>=20 |> |-----Original Message----- |> |From: ope...@li... |> |[mailto:ope...@li...] On Behalf Of=20 |> |DJ Gregor |> |Sent: Saturday, June 23, 2007 9:48 AM |> |To: General OpenNMS Discussion |> |Subject: Re: [opennms-discuss] 1.3.3-1 |> |java.lang.OutOfMemoryError: Java heapspace |> | |> |On Sat, 23 Jun 2007 06:46:59 -0700, "Paul Mona"=20 |> |<pm...@co...> |> |said: |> |> While running the 1.3.3-1 release, I've seen a number of=20 |> |> "OutOfMemeoryError" exceptions happen. This was also very=20 |> |> prevalent in previous releases. To avoid this, we have deployed |> |multiple disks |> |> in our boxes with a raid0 array dedicated solely to writing |> |rrd data. =20 |> |> But still the problem occurs. |> |> =20 |> |> In a past thread, David wrote: |> |> [ additional quoting added ] |> |> > This I suspect to be the problem. I've recently seen this at=20 |> |> > another site. We have a poor implementation of of a=20 |call used in=20 |> |> > Hibernate to |> |>> load up all a node's data in the collector because we |> |currently have |> |>> no |> |> > transaction boundary in the collector code (I know, lot of |> |mumbo jumbo). |> |> > So when a node has a *lot* of interfaces like these, then there=20 |> |> > is the potential for an extraordinary amount of memory to be |> |used when, |> |> > if implemented correctly, we wouldn't have this problem. We're=20 |> |> > going to have to address it before we release 1.3.3. |> |> =20 |> |> Does anyone know if this has been addressed? |> | |> |"PostgreSQL JDBC driver runs out of memory when=20 |> |NodeDaoHibernate.getHierarchy is called on a node with many=20 |interfaces" |> |http://bugzilla.opennms.org//show_bug.cgi?id=3D1888 |> | |> |A patch has been applied and is in 1.3.3 that eliminates=20 |the complex=20 |> |query that returns a very large number of rows. |> | |> |If you were running into bug #1888, you would see that the=20 |JVM would=20 |> |temporarily run out of memory when it was calling=20 |> |NodeDaoHibernate.getHierarchy on nodes with a large number of=20 |> |interfaces. I think we were seeing this in case where a node has=20 |> |hundreds of interfaces. The memory would end up getting freed once=20 |> |the NodeDaoHibernate.getHierarchy call failed, and OpenNMS would=20 |> |generally continue to run okay (I believe). You would see things=20 |> |like this in the logs (collectd.log, I think): |> | |> | org.postgresql.util.PSQLException: Ran out of memory retrieving |> | query results. |> | org.opennms.netmgt.dao.hibernate.NodeDaoHibernate.getHierarchy |> | |> |The latter line would be part of an exception stack trace, usually=20 |> |from the stack trace of the PSQLException shown above. |> | |> |If you aren't getting those errors, they you aren't running=20 |into this=20 |> |problem, and since the problem you are seeing appears to be=20 |> |permanent, and not temporary, I would suspect something else. |> | |> |Have you verified that the disks are still keeping up with RRD data=20 |> |storage and have you looked at the RRD queue statistics--see my=20 |> |previous message to this list for details. | |--------------------------------------------------------------- |---------- |This SF.net email is sponsored by DB2 Express Download DB2=20 |Express C - the FREE version of DB2 express and take control=20 |of your XML. No limits. Just data. Click to get it now. |http://sourceforge.net/powerbar/db2/ |_______________________________________________ |Please read the OpenNMS Mailing List FAQ: |http://www.opennms.org/index.php/Mailing_List_FAQ | |opennms-discuss mailing list | |To *unsubscribe* or change your subscription options, see the=20 |bottom of this page: |https://lists.sourceforge.net/lists/listinfo/opennms-discuss | |