http://jira.hyperic.com/browse/HHQ-4254
To reproduce:
Configure 2 node HA environment
Add one or more agents to the inventory
After few hours all agent schedule gets wiped out.
Here's sequence of events reconstructed from agent/server logs.
- Server reports that agent is sending data for non-existing entity, following message is repeated for all resources under an agent:
2010-08-27 05:54:40,931 ERROR [Thread-6870] [org.hyperic.hq.measurement.server.session.SRNManagerImpl@201] Agent's reporting for non-existing entity: 3:10108
- Few seconds later agent gets unschedule measurement command from server for all of its measurements:
2010-08-27 05:55:11,859 DEBUG [Thread-0] [CommandListener] Dispatching request for 'rtm:unscheduleMeasurements'
2010-08-27 05:55:11,860 DEBUG [Thread-0] [MeasurementCommandsService] Received unschedule request for 22 resources
2010-08-27 05:55:11,860 DEBUG [Thread-0] [MeasurementCommandsService] Deleting metrics for 3:10127
2010-08-27 05:55:11,860 DEBUG [Thread-0] [ScheduleThread] Unscheduling 2 metrics for 3:10127
2010-08-27 05:55:11,860 DEBUG [Thread-0] [MeasurementSchedule] SRN for entity 3:10127 removed
2010-08-27 05:55:11,861 DEBUG [Thread-0] [MeasurementSchedule] Removing scheduled measurement [derivedId=10798|dsnId=1079
Attaching logs (server.log1 [master node], server.log2 [slave node], agent1 and agent2)
Anonymous