Hi,
Observed below segmentation fault. Event logs captured when segmentation fault happened are as well pasted below: (I was running an openhpi client and daemon from a separate linux m/c, which does discovery at an interval in a loop and checks for components powered state and event logs)
Starting program: /usr/sbin/openhpid -n -c /etc/openhpi/orig/openhpi.conf
[Thread debugging using libthread_db enabled]
[New Thread 0x2b253cda4fa0 (LWP 7334)]
[New Thread 0x41ef4940 (LWP 7335)]
[New Thread 0x428f5940 (LWP 7336)]
[New Thread 0x432f6940 (LWP 7337)]
[New Thread 0x410fd940 (LWP 7338)]
[New Thread 0x43cf7940 (LWP 7339)]
[New Thread 0x446f8940 (LWP 7362)]
[Thread 0x446f8940 (LWP 7362) exited]
[New Thread 0x446f8940 (LWP 7385)]
[New Thread 0x450f9940 (LWP 7445)]
[New Thread 0x45afa940 (LWP 7447)]
[Thread 0x446f8940 (LWP 7385) exited]
[New Thread 0x446f8940 (LWP 7476)]
[Thread 0x446f8940 (LWP 7476) exited]
[New Thread 0x446f8940 (LWP 7492)]
[Thread 0x446f8940 (LWP 7492) exited]
[New Thread 0x446f8940 (LWP 7508)]
[Thread 0x446f8940 (LWP 7508) exited]
[Thread 0x450f9940 (LWP 7445) exited]
[Thread 0x45afa940 (LWP 7447) exited]
[New Thread 0x45afa940 (LWP 7536)]
[New Thread 0x450f9940 (LWP 7538)]
[Thread 0x450f9940 (LWP 7538) exited]
[Thread 0x45afa940 (LWP 7536) exited]
[New Thread 0x45afa940 (LWP 7646)]
[New Thread 0x450f9940 (LWP 7662)]
[Thread 0x450f9940 (LWP 7662) exited]
[New Thread 0x450f9940 (LWP 7723)]
[New Thread 0x446f8940 (LWP 7725)]
[New Thread 0x464fb940 (LWP 7741)]
[Thread 0x464fb940 (LWP 7741) exited]
[Thread 0x45afa940 (LWP 7646) exited]
[New Thread 0x45afa940 (LWP 7792)]
[New Thread 0x464fb940 (LWP 7823)]
[New Thread 0x46efc940 (LWP 7880)]
[Thread 0x46efc940 (LWP 7880) exited]
[Thread 0x45afa940 (LWP 7792) exited]
[Thread 0x450f9940 (LWP 7723) exited]
[Thread 0x446f8940 (LWP 7725) exited]
[New Thread 0x446f8940 (LWP 8220)]
[New Thread 0x450f9940 (LWP 8222)]
[New Thread 0x45afa940 (LWP 8238)]
[Thread 0x45afa940 (LWP 8238) exited]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x432f6940 (LWP 7337)]
0x000000347da70454 in malloc_consolidate () from /lib64/libc.so.6
(gdb) bt
#0 0x000000347da70454 in malloc_consolidate () from /lib64/libc.so.6
#1 0x000000347da72a1a in _int_malloc () from /lib64/libc.so.6
#2 0x000000347da7486d in calloc () from /lib64/libc.so.6
#3 0x000000347fa33b82 in g_malloc0 () from /lib64/libglib-2.0.so.0
#4 0x00002b253bd62a8a in oh_add_rdr (table=0x95b4010, rid=238,
rdr=0x2aaaac0bc5b0, data=0x0, owndata=0) at rpt_utils.c:728
#5 0x000000000044f3b5 in process_hs_event (d=0x95b3ee0, e=0x2aaaac0c3010)
at event.c:371
#6 0x000000000044f5d1 in process_event (did=0, e=0x2aaaac0c3010)
at event.c:444
#7 0x000000000044f691 in oh_process_events () at event.c:495
#8 0x000000000044f885 in oh_evtpop_thread_loop (data=0x0) at threaded.c:104
#9 0x000000347fa48e04 in ?? () from /lib64/libglib-2.0.so.0
#10 0x000000347e6064a7 in start_thread () from /lib64/libpthread.so.0
#11 0x000000347dad3c2d in clone () from /lib64/libc.so.6
(gdb)
Event logs observed when segmentation fault occured:
callbackForBlade- {SYSTEM_CHASSIS,1}{SYSTEM_BLADE,9}
, element Failed Status = 0, element Powered State = 0, at time = 2011-04-05-18:36:22.
Event Type: HOTSWAP
From Resource: {SYSTEM_CHASSIS,1}{SYSTEM_BLADE,9}
Event Resource ID: 233
Event Timestamp: 2011-04-05 18:36:13
Event Severity: CRITICAL
HotswapEvent:
HotSwapState: EXTRACTION_PENDING
PreviousHotSwapState: ACTIVE
CauseOfStateChange: CAUSE_UNEXPECTED_DEACTIVATION
Event Type: HOTSWAP
From Resource: {SYSTEM_CHASSIS,1}{SYSTEM_BLADE,9}
Event Resource ID: 233
Event Timestamp: 2011-04-05 18:36:13
Event Severity: CRITICAL
HotswapEvent:
HotSwapState: INACTIVE
PreviousHotSwapState: EXTRACTION_PENDING
CauseOfStateChange: CAUSE_AUTO_POLICY
Event Type: HOTSWAP
From Resource: {SYSTEM_CHASSIS,1}{SYSTEM_BLADE,1}
Event Resource ID: 225
Event Timestamp: 2011-04-05 19:48:37
Event Severity: OK
HotswapEvent:
HotSwapState: NOT_PRESENT
PreviousHotSwapState: ACTIVE
CauseOfStateChange: CAUSE_SURPRISE_EXTRACTION
Event Type: HOTSWAP
From Resource: {SYSTEM_CHASSIS,1}{SYSTEM_BLADE,1}
Event Resource ID: 225
Event Timestamp: 2011-04-05 19:48:44
Event Severity: OK
HotswapEvent:
HotSwapState: INSERTION_PENDING
PreviousHotSwapState: NOT_PRESENT
CauseOfStateChange: CAUSE_OPERATOR_INIT
Event Type: HOTSWAP
From Resource: {SYSTEM_CHASSIS,1}{SYSTEM_BLADE,1}
Event Resource ID: 225
Event Timestamp: 2011-04-05 19:48:44
Event Severity: OK
HotswapEvent:
HotSwapState: ACTIVE
PreviousHotSwapState: INSERTION_PENDING
CauseOfStateChange: CAUSE_AUTO_POLICY
Regards,
Preeti
2.15/2.16/trunk?
I suspect plug-in didn't set correctly rdrs or rdrs_to_remove lists in oh_event structure.
fault happened at time Apr 5 19:50:04 . bad host name lookup is not an issue in my system as they come for IO_BALDES which are defined a partner device of SYSTEM_BLADES. IO_BLADES do not any IP address assigned.
Hi Preeti,
We are not able to reproduce the issue with our setup (Configuration Details: c7000 Enclosure, BL465c G6, BL465c G5, BL685c G6, BL685c G7 & BL490c G6 servers, and running Openhpi2.17.1). It looks like issue is specific to the configuration. Please share the configuration details (like resources present in Enclosure, version of openhpi etc) of the setup in which issue is seen.
generated using openhpi client hpigensimdata
Hi shyamala,
Configuration details are:
OpenHpi version: 2.15.1
Chassis detail : C7000
Plugin : OA_SOAP
Chassis configuration pasted below: (as well ataching simulationC7000.data)
0 RPT: id = 1 ResourceId = 1 Tag = BladeSystem c7000 Enclosure G2
1 RPT: id = 225 ResourceId = 225 Tag = ProLiant BL460c G7
2 RPT: id = 226 ResourceId = 226 Tag = BLc-Class PCI Expansion Blade
3 RPT: id = 227 ResourceId = 227 Tag = ProLiant BL460c G7
4 RPT: id = 228 ResourceId = 228 Tag = BLc-Class PCI Expansion Blade
5 RPT: id = 229 ResourceId = 229 Tag = ProLiant BL460c G7
6 RPT: id = 230 ResourceId = 230 Tag = BLc-Class PCI Expansion Blade
7 RPT: id = 231 ResourceId = 231 Tag = ProLiant BL460c G7
8 RPT: id = 232 ResourceId = 232 Tag = ProLiant BL460c G7
9 RPT: id = 233 ResourceId = 233 Tag = ProLiant BL460c G7
10 RPT: id = 234 ResourceId = 234 Tag = BLc-Class PCI Expansion Blade
11 RPT: id = 235 ResourceId = 235 Tag = ProLiant BL460c G7
12 RPT: id = 236 ResourceId = 236 Tag = BLc-Class PCI Expansion Blade
13 RPT: id = 237 ResourceId = 237 Tag = ProLiant BL460c G7
14 RPT: id = 238 ResourceId = 238 Tag = ProLiant BL460c G7
15 RPT: id = 239 ResourceId = 239 Tag = ProLiant BL460c G7
16 RPT: id = 240 ResourceId = 240 Tag = ProLiant BL460c G7
17 RPT: id = 241 ResourceId = 241 Tag = HP VC Flex-10 Enet Module
18 RPT: id = 5 ResourceId = 5 Tag = Thermal Subsystem
19 RPT: id = 6 ResourceId = 6 Tag = Fan Zone
20 RPT: id = 242 ResourceId = 242 Tag = Fan Zone
21 RPT: id = 243 ResourceId = 243 Tag = Fan Zone
22 RPT: id = 244 ResourceId = 244 Tag = Fan Zone
23 RPT: id = 245 ResourceId = 245 Tag = Fan
24 RPT: id = 246 ResourceId = 246 Tag = Fan
25 RPT: id = 247 ResourceId = 247 Tag = Fan
26 RPT: id = 248 ResourceId = 248 Tag = Fan
27 RPT: id = 249 ResourceId = 249 Tag = Fan
28 RPT: id = 250 ResourceId = 250 Tag = Fan
29 RPT: id = 251 ResourceId = 251 Tag = Fan
30 RPT: id = 252 ResourceId = 252 Tag = Fan
31 RPT: id = 253 ResourceId = 253 Tag = Fan
32 RPT: id = 254 ResourceId = 254 Tag = Fan
33 RPT: id = 255 ResourceId = 255 Tag = Power Subsystem
34 RPT: id = 256 ResourceId = 256 Tag = Power Supply Unit
35 RPT: id = 257 ResourceId = 257 Tag = Power Supply Unit
36 RPT: id = 258 ResourceId = 258 Tag = Power Supply Unit
37 RPT: id = 259 ResourceId = 259 Tag = Power Supply Unit
38 RPT: id = 260 ResourceId = 260 Tag = Power Supply Unit
39 RPT: id = 261 ResourceId = 261 Tag = Power Supply Unit
40 RPT: id = 2 ResourceId = 2 Tag = Onboard Administrator
41 RPT: id = 262 ResourceId = 262 Tag = Onboard Administrator
42 RPT: id = 263 ResourceId = 263 Tag = LCD
Regards,
Preeti
Have tried the steps mentioned in the bug, unable to reproduce the issue and there is no more information.
Preeti,
could you run openhpid under gdb and show backtrace?
Hi shyamala/Anton,
I haven't seen issue again as well not sure of steps to reproduce this. This issue can be closed, if I see the issue again will raise another bug.
Regards,
Preeti
We tried to reproduce this bug many times, but failed. So we are closing it. May be there are more variables involved. Once this bug is reproducible and the config is known please file a new bug.