From: SourceForge.net <no...@so...> - 2012-08-30 15:32:40
|
Bugs item #3562666, was opened at 2012-08-29 01:23 Message generated for change (Settings changed) made by hemanthreddy You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=532251&aid=3562666&group_id=71730 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: OpenHPI Daemon Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Michele Baldessari (mbaldessari) >Assigned to: Hemantha Beecherla (hemanthreddy) Summary: 3.2.0 segfaults when retrieving thermal info Initial Comment: openhpi-3.2.0 openhpid segfaults at start with: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff7502700 (LWP 19073)] 0x00007ffff753ebb9 in soap_value (node=0x7ffff0036fe0) at oa_soap_callsupport.c:240 240 if ((node) && (node->children) && (node->children->content)) { (gdb) thread apply all bt Will attach full debug log and thread bt from gdb ---------------------------------------------------------------------- Comment By: Michele Baldessari (mbaldessari) Date: 2012-08-29 14:47 Message: Hi Hemantha, I've attached the openhpi-new-blades.patch file which fixes all the segfaults I've seen. Please review it and let me know if it is ok or if it needs tweaking. Thanks for your time, Michele ---------------------------------------------------------------------- Comment By: Michele Baldessari (mbaldessari) Date: 2012-08-29 13:49 Message: Bah, too late to look at stuff I guess. ;) I think the issue is that this blade is missing the System Zone in its output: <hpoa:bladeThermalInfoArray> <hpoa:bladeThermalInfo> <hpoa:sensorNumber>5</hpoa:sensorNumber> <hpoa:sensorType>1</hpoa:sensorType> <hpoa:entityId>8</hpoa:entityId> <hpoa:entityInstance>1</hpoa:entityInstance> <hpoa:criticalThreshold>86</hpoa:criticalThreshold> <hpoa:cautionThreshold>81</hpoa:cautionThreshold> <hpoa:temperatureC>38</hpoa:temperatureC> <hpoa:oem>0</hpoa:oem> <hpoa:description>Memory Zone</hpoa:description> <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData> </hpoa:bladeThermalInfo> <hpoa:bladeThermalInfo> <hpoa:sensorNumber>6</hpoa:sensorNumber> <hpoa:sensorType>1</hpoa:sensorType> <hpoa:entityId>9</hpoa:entityId> <hpoa:entityInstance>3</hpoa:entityInstance> <hpoa:criticalThreshold>75</hpoa:criticalThreshold> <hpoa:cautionThreshold>70</hpoa:cautionThreshold> <hpoa:temperatureC>29</hpoa:temperatureC> <hpoa:oem>0</hpoa:oem> <hpoa:description>CPU Zone</hpoa:description> <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData> </hpoa:bladeThermalInfo> <hpoa:bladeThermalInfo> <hpoa:sensorNumber>7</hpoa:sensorNumber> <hpoa:sensorType>1</hpoa:sensorType> <hpoa:entityId>3</hpoa:entityId> <hpoa:entityInstance>1</hpoa:entityInstance> <hpoa:criticalThreshold>100</hpoa:criticalThreshold> <hpoa:cautionThreshold>95</hpoa:cautionThreshold> <hpoa:temperatureC>42</hpoa:temperatureC> <hpoa:oem>1</hpoa:oem> <hpoa:description>CPU 1</hpoa:description> <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData> </hpoa:bladeThermalInfo> <hpoa:bladeThermalInfo> <hpoa:sensorNumber>8</hpoa:sensorNumber> <hpoa:sensorType>1</hpoa:sensorType> <hpoa:entityId>3</hpoa:entityId> <hpoa:entityInstance>2</hpoa:entityInstance> <hpoa:criticalThreshold>100</hpoa:criticalThreshold> <hpoa:cautionThreshold>95</hpoa:cautionThreshold> <hpoa:temperatureC>41</hpoa:temperatureC> <hpoa:oem>1</hpoa:oem> <hpoa:description>CPU 1</hpoa:description> <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData> </hpoa:bladeThermalInfo> <hpoa:bladeThermalInfo> <hpoa:sensorNumber>9</hpoa:sensorNumber> <hpoa:sensorType>1</hpoa:sensorType> <hpoa:entityId>39</hpoa:entityId> <hpoa:entityInstance>1</hpoa:entityInstance> <hpoa:criticalThreshold>43</hpoa:criticalThreshold> <hpoa:cautionThreshold>38</hpoa:cautionThreshold> <hpoa:temperatureC>17</hpoa:temperatureC> <hpoa:oem>0</hpoa:oem> <hpoa:description>Ambient Zone</hpoa:description> <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData> </hpoa:bladeThermalInfo> <hpoa:bladeThermalInfo> <hpoa:sensorNumber>10</hpoa:sensorNumber> <hpoa:sensorType>1</hpoa:sensorType> <hpoa:entityId>3</hpoa:entityId> <hpoa:entityInstance>3</hpoa:entityInstance> <hpoa:criticalThreshold>100</hpoa:criticalThreshold> <hpoa:cautionThreshold>95</hpoa:cautionThreshold> <hpoa:temperatureC>42</hpoa:temperatureC> <hpoa:oem>2</hpoa:oem> <hpoa:description>CPU 2</hpoa:description> <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData> </hpoa:bladeThermalInfo> <hpoa:bladeThermalInfo> <hpoa:sensorNumber>11</hpoa:sensorNumber> <hpoa:sensorType>1</hpoa:sensorType> <hpoa:entityId>3</hpoa:entityId> <hpoa:entityInstance>4</hpoa:entityInstance> <hpoa:criticalThreshold>100</hpoa:criticalThreshold> <hpoa:cautionThreshold>95</hpoa:cautionThreshold> <hpoa:temperatureC>41</hpoa:temperatureC> <hpoa:oem>2</hpoa:oem> <hpoa:description>CPU 2</hpoa:description> <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData> </hpoa:bladeThermalInfo> </hpoa:bladeThermalInfo> </hpoa:bladeThermalInfoArray> Probably worth addying a 480c g1 entry? ---------------------------------------------------------------------- Comment By: Michele Baldessari (mbaldessari) Date: 2012-08-29 12:47 Message: Hi Hemantha, I *think* I know what the issue is here. Blade 2 returns a hpoa:bladeThermalInfoArray that openhpi cannot cope with. This are the last two bladeThermalInfo nodes in the array: <hpoa:bladeThermalInfo> <hpoa:sensorNumber>10</hpoa:sensorNumber> <hpoa:sensorType>1</hpoa:sensorType> <hpoa:entityId>3</hpoa:entityId> <hpoa:entityInstance>3</hpoa:entityInstance> <hpoa:criticalThreshold>100</hpoa:criticalThreshold> <hpoa:cautionThreshold>95</hpoa:cautionThreshold> <hpoa:temperatureC>42</hpoa:temperatureC> <hpoa:oem>2</hpoa:oem> <hpoa:description>CPU 2</hpoa:description> <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData> </hpoa:bladeThermalInfo> <hpoa:bladeThermalInfo> <hpoa:sensorNumber>11</hpoa:sensorNumber> <hpoa:sensorType>1</hpoa:sensorType> <hpoa:entityId>3</hpoa:entityId> <hpoa:entityInstance>4</hpoa:entityInstance> <hpoa:criticalThreshold>100</hpoa:criticalThreshold> <hpoa:cautionThreshold>95</hpoa:cautionThreshold> <hpoa:temperatureC>41</hpoa:temperatureC> <hpoa:oem>2</hpoa:oem> <hpoa:description>CPU 2</hpoa:description> <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData> </hpoa:bladeThermalInfo> Note that they have the *same* description but map to different sensors. The code in plugins/oa_soap/oa_soap_sensors.c:oa_soap_get_bld_thrm_sen_data() does use the description field as well to map the sensor to a number: /* As per discovery, mapping between the bladeThermalInfo response and * the sensor number can be be achieved based on the description string * in response and comment field in the sensor rdr. * Sometimes the comment field in sensor rdr may not have matching * substring in soap response. * Hence map the sensor rdr comment field to the standard string listed * in oa_soap_thermal_sensor_string array. as it is assumed that the * strings list in this array will match the description in response * For example: * The comment field of the system zone sensor * is "System Zone thermal status" and if the description field of * bladeThermalInfo structure is "System Zone", then it is possible to * achieve mapping between the response and sensor rdr. * But if the description field of bladeThermalInfo contains * "System Chassis", then it is difficult to achieve the mapping * between bladeThermalInfo structure instance to any particular sensor. */ for (i = 0; i <OA_SOAP_MAX_THRM_SEN; i++) { if ((strstr(oa_soap_sen_arr[sen_num].comment, oa_soap_thermal_sensor_string[i]))) { index = i; break; } } So from a quick glance (but I haven't dwelved too deeply in the code here) this is the reason we segfault. Does this make sense? ---------------------------------------------------------------------- Comment By: Michele Baldessari (mbaldessari) Date: 2012-08-29 08:28 Message: Hi Hemantha, not sure. I can ask IT to upgrade to a later FW but I don't know if/when this can happen. Is 3.31 sending unexpected data via SOAP that openhpi can't cope with here? I guess it'd be nice to still kill this segfault, but let me see if I can get the FW updated and can reproduce. regards, Michele ---------------------------------------------------------------------- Comment By: Hemantha Beecherla (hemanthreddy) Date: 2012-08-29 08:17 Message: Hi Michele, Thanks for providing the information. One quick query, Is it reproduces with OA FW 5.6 / 6.0 too?. Because OA 3.32 is very old. Thanks & Regards, Hemantha Reddy ---------------------------------------------------------------------- Comment By: Michele Baldessari (mbaldessari) Date: 2012-08-29 07:23 Message: Hi Hemantha, thanks for getting back to me: 1) I tested this on the following openhpi versions and it happens on all of them: 2.14, 3.0, 3.2 and trunk as of today 2) OA FW version: BladeSystem c7000 Onboard Administrator HP 3.32 Aug 24 2011 3) RHEL 6.2 - gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) 4) There are many different blades in this C7000 enclosure: ProLiant BL685c G6 ProLiant BL480c G1 ProLiant BL260c G5 ProLiant BL460c G5 ProLiant BL495c G5 ProLiant BL465c G7 ProLiant BL460c G7 ProLiant BL495c G5 ProLiant BL460c G1 ProLiant BL490c G6 ProLiant BL280c G6 5) The issue occurs after 1 minute 100% at every start Let me know if I can provide any other info regards, Michele ---------------------------------------------------------------------- Comment By: Hemantha Beecherla (hemanthreddy) Date: 2012-08-29 06:20 Message: Hi Michele, Thanks for filing the issue. Could you please provide us the configuration used and steps to reproduce this issue, OA FW version? OS details and GCC Version? What are the blades used in C7000 Enclosure? How frequently this issue occurs? By analyzing the gdb trace we found that this issue occurs in the first discovery with BL480c Blade. Thanks & Regards, Hemantha Reddy ---------------------------------------------------------------------- Comment By: Michele Baldessari (mbaldessari) Date: 2012-08-29 02:05 Message: Also happens on today's trunk (same backtrace) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=532251&aid=3562666&group_id=71730 |