|
From: DJ G. <dj...@op...> - 2007-06-24 17:25:56
|
Paul,
Please see the question in the last paragraph of my original response to
this thread regarding RRD data storage. Let us know what you find.
Here's what I wrote before:
> Have you verified that the disks are still keeping up with RRD
> data storage and have you looked at the RRD queue
> statistics--see my previous message to this list for details.
David Hustace has also asked in another email for the full exception
that you are seeing in the logs, but I haven't seen a response from you
on that.
To help you to diagnose this, we need you provide the information we are
asking for. Other than RRD queueing-related out of memory errors, we
really don't see OutOfMemory errors regularly, so we need details on
what is happening to try to figure out the cause.
- djg
On Sun, 24 Jun 2007 05:46:02 -0700, "Paul Mona" <pm...@co...>
said:
>
>
> This was happening every 4 hours with linkd enabled. Since disabling
> linkd, 14 hours have past with out an error.
>
>
>
>
>
> |-----Original Message-----
> |From: ope...@li...
> |[mailto:ope...@li...] On
> |Behalf Of DJ Gregor
> |Sent: Saturday, June 23, 2007 9:48 AM
> |To: General OpenNMS Discussion
> |Subject: Re: [opennms-discuss] 1.3.3-1
> |java.lang.OutOfMemoryError: Java heapspace
> |
> |On Sat, 23 Jun 2007 06:46:59 -0700, "Paul Mona" <pm...@co...>
> |said:
> |> While running the 1.3.3-1 release, I've seen a number of
> |> "OutOfMemeoryError" exceptions happen. This was also very prevalent
> |> in previous releases. To avoid this, we have deployed
> |multiple disks
> |> in our boxes with a raid0 array dedicated solely to writing
> |rrd data.
> |> But still the problem occurs.
> |>
> |> In a past thread, David wrote:
> |> [ additional quoting added ]
> |> > This I suspect to be the problem. I've recently seen this at
> |> > another site. We have a poor implementation of of a call used in
> |> > Hibernate to
> |>> load up all a node's data in the collector because we
> |currently have
> |>> no
> |> > transaction boundary in the collector code (I know, lot of
> |mumbo jumbo).
> |> > So when a node has a *lot* of interfaces like these, then there is
> |> > the potential for an extraordinary amount of memory to be
> |used when,
> |> > if implemented correctly, we wouldn't have this problem. We're
> |> > going to have to address it before we release 1.3.3.
> |>
> |> Does anyone know if this has been addressed?
> |
> |"PostgreSQL JDBC driver runs out of memory when
> |NodeDaoHibernate.getHierarchy is called on a node with many interfaces"
> |http://bugzilla.opennms.org//show_bug.cgi?id=1888
> |
> |A patch has been applied and is in 1.3.3 that eliminates the
> |complex query that returns a very large number of rows.
> |
> |If you were running into bug #1888, you would see that the JVM
> |would temporarily run out of memory when it was calling
> |NodeDaoHibernate.getHierarchy on nodes with a large number of
> |interfaces. I think we were seeing this in case where a node
> |has hundreds of interfaces. The memory would end up getting
> |freed once the NodeDaoHibernate.getHierarchy call failed, and
> |OpenNMS would generally continue to run okay (I believe). You
> |would see things like this in the logs (collectd.log, I think):
> |
> | org.postgresql.util.PSQLException: Ran out of memory retrieving
> | query results.
> | org.opennms.netmgt.dao.hibernate.NodeDaoHibernate.getHierarchy
> |
> |The latter line would be part of an exception stack trace,
> |usually from the stack trace of the PSQLException shown above.
> |
> |If you aren't getting those errors, they you aren't running
> |into this problem, and since the problem you are seeing
> |appears to be permanent, and not temporary, I would suspect
> |something else.
> |
> |Have you verified that the disks are still keeping up with RRD
> |data storage and have you looked at the RRD queue
> |statistics--see my previous message to this list for details.
|