#1756 3.2.0 segfaults when retrieving thermal info

3.2.1
closed-fixed
5
2013-06-20
2012-08-29
No

openhpi-3.2.0 openhpid segfaults at start with:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7502700 (LWP 19073)]
0x00007ffff753ebb9 in soap_value (node=0x7ffff0036fe0) at oa_soap_callsupport.c:240
240 if ((node) && (node->children) && (node->children->content)) {
(gdb) thread apply all bt

Will attach full debug log and thread bt from gdb

Discussion

  • Michele Baldessari

    gdb trace + debug output

     
  • Anton Pak

    Anton Pak - 2012-08-29
    • assigned_to: avpak --> dr_mohan
     
  • Michele Baldessari

    Also happens on today's trunk (same backtrace)

     
  • Hemantha Beecherla

    Hi Michele,
    Thanks for filing the issue.
    Could you please provide us the configuration used and steps to reproduce this issue,
    OA FW version?
    OS details and GCC Version?
    What are the blades used in C7000 Enclosure?
    How frequently this issue occurs?
    By analyzing the gdb trace we found that this issue occurs in the first discovery with BL480c Blade.

    Thanks & Regards,
    Hemantha Reddy

     
  • Michele Baldessari

    Hi Hemantha,

    thanks for getting back to me:
    1) I tested this on the following openhpi versions and it happens on all of them: 2.14, 3.0, 3.2 and trunk as of today
    2) OA FW version: BladeSystem c7000 Onboard Administrator HP 3.32 Aug 24 2011
    3) RHEL 6.2 - gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC)
    4) There are many different blades in this C7000 enclosure:
    ProLiant BL685c G6
    ProLiant BL480c G1
    ProLiant BL260c G5
    ProLiant BL460c G5
    ProLiant BL495c G5
    ProLiant BL465c G7
    ProLiant BL460c G7
    ProLiant BL495c G5
    ProLiant BL460c G1
    ProLiant BL490c G6
    ProLiant BL280c G6
    5) The issue occurs after 1 minute 100% at every start

    Let me know if I can provide any other info

    regards,
    Michele

     
  • Hemantha Beecherla

    Hi Michele,
    Thanks for providing the information.
    One quick query,
    Is it reproduces with OA FW 5.6 / 6.0 too?.
    Because OA 3.32 is very old.

    Thanks & Regards,
    Hemantha Reddy

     
  • Michele Baldessari

    Hi Hemantha,

    not sure. I can ask IT to upgrade to a later FW but I don't know if/when this can happen.
    Is 3.31 sending unexpected data via SOAP that openhpi can't cope with here?

    I guess it'd be nice to still kill this segfault, but let me see if I can get the FW updated and can reproduce.

    regards,
    Michele

     
  • Anton Pak

    Anton Pak - 2012-08-29
    • labels: 622325 --> 1085740
     
  • Michele Baldessari

    Hi Hemantha,

    I *think* I know what the issue is here. Blade 2 returns a hpoa:bladeThermalInfoArray that openhpi cannot cope with.
    This are the last two bladeThermalInfo nodes in the array:
    <hpoa:bladeThermalInfo>
    <hpoa:sensorNumber>10</hpoa:sensorNumber>
    <hpoa:sensorType>1</hpoa:sensorType>
    <hpoa:entityId>3</hpoa:entityId>
    <hpoa:entityInstance>3</hpoa:entityInstance>
    <hpoa:criticalThreshold>100</hpoa:criticalThreshold>
    <hpoa:cautionThreshold>95</hpoa:cautionThreshold>
    <hpoa:temperatureC>42</hpoa:temperatureC>
    <hpoa:oem>2</hpoa:oem>
    <hpoa:description>CPU 2</hpoa:description>
    <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData>
    </hpoa:bladeThermalInfo>
    <hpoa:bladeThermalInfo>
    <hpoa:sensorNumber>11</hpoa:sensorNumber>
    <hpoa:sensorType>1</hpoa:sensorType>
    <hpoa:entityId>3</hpoa:entityId>
    <hpoa:entityInstance>4</hpoa:entityInstance>
    <hpoa:criticalThreshold>100</hpoa:criticalThreshold>
    <hpoa:cautionThreshold>95</hpoa:cautionThreshold>
    <hpoa:temperatureC>41</hpoa:temperatureC>
    <hpoa:oem>2</hpoa:oem>
    <hpoa:description>CPU 2</hpoa:description>
    <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData>
    </hpoa:bladeThermalInfo>

    Note that they have the *same* description but map to different sensors. The code in plugins/oa_soap/oa_soap_sensors.c:oa_soap_get_bld_thrm_sen_data() does
    use the description field as well to map the sensor to a number:

    /* As per discovery, mapping between the bladeThermalInfo response and
    * the sensor number can be be achieved based on the description string
    * in response and comment field in the sensor rdr.
    * Sometimes the comment field in sensor rdr may not have matching
    * substring in soap response.
    * Hence map the sensor rdr comment field to the standard string listed
    * in oa_soap_thermal_sensor_string array. as it is assumed that the
    * strings list in this array will match the description in response
    * For example:
    * The comment field of the system zone sensor
    * is "System Zone thermal status" and if the description field of
    * bladeThermalInfo structure is "System Zone", then it is possible to
    * achieve mapping between the response and sensor rdr.
    * But if the description field of bladeThermalInfo contains
    * "System Chassis", then it is difficult to achieve the mapping
    * between bladeThermalInfo structure instance to any particular sensor.
    */
    for (i = 0; i <OA_SOAP_MAX_THRM_SEN; i++) {
    if ((strstr(oa_soap_sen_arr[sen_num].comment,
    oa_soap_thermal_sensor_string[i]))) {
    index = i;
    break;
    }
    }

    So from a quick glance (but I haven't dwelved too deeply in the code here) this is the reason we segfault.
    Does this make sense?

     
  • Michele Baldessari

    Bah, too late to look at stuff I guess. ;) I think the issue is that this blade is missing the System Zone in its output:
    <hpoa:bladeThermalInfoArray>
    <hpoa:bladeThermalInfo>
    <hpoa:sensorNumber>5</hpoa:sensorNumber>
    <hpoa:sensorType>1</hpoa:sensorType>
    <hpoa:entityId>8</hpoa:entityId>
    <hpoa:entityInstance>1</hpoa:entityInstance>
    <hpoa:criticalThreshold>86</hpoa:criticalThreshold>
    <hpoa:cautionThreshold>81</hpoa:cautionThreshold>
    <hpoa:temperatureC>38</hpoa:temperatureC>
    <hpoa:oem>0</hpoa:oem>
    <hpoa:description>Memory Zone</hpoa:description>
    <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData>
    </hpoa:bladeThermalInfo>
    <hpoa:bladeThermalInfo>
    <hpoa:sensorNumber>6</hpoa:sensorNumber>
    <hpoa:sensorType>1</hpoa:sensorType>
    <hpoa:entityId>9</hpoa:entityId>
    <hpoa:entityInstance>3</hpoa:entityInstance>
    <hpoa:criticalThreshold>75</hpoa:criticalThreshold>
    <hpoa:cautionThreshold>70</hpoa:cautionThreshold>
    <hpoa:temperatureC>29</hpoa:temperatureC>
    <hpoa:oem>0</hpoa:oem>
    <hpoa:description>CPU Zone</hpoa:description>
    <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData>
    </hpoa:bladeThermalInfo>
    <hpoa:bladeThermalInfo>
    <hpoa:sensorNumber>7</hpoa:sensorNumber>
    <hpoa:sensorType>1</hpoa:sensorType>
    <hpoa:entityId>3</hpoa:entityId>
    <hpoa:entityInstance>1</hpoa:entityInstance>
    <hpoa:criticalThreshold>100</hpoa:criticalThreshold>
    <hpoa:cautionThreshold>95</hpoa:cautionThreshold>
    <hpoa:temperatureC>42</hpoa:temperatureC>
    <hpoa:oem>1</hpoa:oem>
    <hpoa:description>CPU 1</hpoa:description>
    <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData>
    </hpoa:bladeThermalInfo>
    <hpoa:bladeThermalInfo>
    <hpoa:sensorNumber>8</hpoa:sensorNumber>
    <hpoa:sensorType>1</hpoa:sensorType>
    <hpoa:entityId>3</hpoa:entityId>
    <hpoa:entityInstance>2</hpoa:entityInstance>
    <hpoa:criticalThreshold>100</hpoa:criticalThreshold>
    <hpoa:cautionThreshold>95</hpoa:cautionThreshold>
    <hpoa:temperatureC>41</hpoa:temperatureC>
    <hpoa:oem>1</hpoa:oem>
    <hpoa:description>CPU 1</hpoa:description>
    <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData>
    </hpoa:bladeThermalInfo>
    <hpoa:bladeThermalInfo>
    <hpoa:sensorNumber>9</hpoa:sensorNumber>
    <hpoa:sensorType>1</hpoa:sensorType>
    <hpoa:entityId>39</hpoa:entityId>
    <hpoa:entityInstance>1</hpoa:entityInstance>
    <hpoa:criticalThreshold>43</hpoa:criticalThreshold>
    <hpoa:cautionThreshold>38</hpoa:cautionThreshold>
    <hpoa:temperatureC>17</hpoa:temperatureC>
    <hpoa:oem>0</hpoa:oem>
    <hpoa:description>Ambient Zone</hpoa:description>
    <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData>
    </hpoa:bladeThermalInfo>
    <hpoa:bladeThermalInfo>
    <hpoa:sensorNumber>10</hpoa:sensorNumber>
    <hpoa:sensorType>1</hpoa:sensorType>
    <hpoa:entityId>3</hpoa:entityId>
    <hpoa:entityInstance>3</hpoa:entityInstance>
    <hpoa:criticalThreshold>100</hpoa:criticalThreshold>
    <hpoa:cautionThreshold>95</hpoa:cautionThreshold>
    <hpoa:temperatureC>42</hpoa:temperatureC>
    <hpoa:oem>2</hpoa:oem>
    <hpoa:description>CPU 2</hpoa:description>
    <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData>
    </hpoa:bladeThermalInfo>
    <hpoa:bladeThermalInfo>
    <hpoa:sensorNumber>11</hpoa:sensorNumber>
    <hpoa:sensorType>1</hpoa:sensorType>
    <hpoa:entityId>3</hpoa:entityId>
    <hpoa:entityInstance>4</hpoa:entityInstance>
    <hpoa:criticalThreshold>100</hpoa:criticalThreshold>
    <hpoa:cautionThreshold>95</hpoa:cautionThreshold>
    <hpoa:temperatureC>41</hpoa:temperatureC>
    <hpoa:oem>2</hpoa:oem>
    <hpoa:description>CPU 2</hpoa:description>
    <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData>
    </hpoa:bladeThermalInfo>
    </hpoa:bladeThermalInfo>
    </hpoa:bladeThermalInfoArray>

    Probably worth addying a 480c g1 entry?

     
  • Michele Baldessari

    • labels: 1085740 --> 622325
     
  • Michele Baldessari

    Support for some additional blade types

     
  • Michele Baldessari

    Hi Hemantha,

    I've attached the openhpi-new-blades.patch file which fixes all the segfaults I've seen.

    Please review it and let me know if it is ok or if it needs tweaking.

    Thanks for your time,
    Michele

     
  • Hemantha Beecherla

    • assigned_to: dr_mohan --> hemanthreddy
     
  • Hemantha Beecherla

    Hi Mechele,
    I have reviewed your patch, it requires few changes,
    For BL480G1 have 2 thermal sensors for CPU1 and 2 for CPU 2
    CPU1 2
    CPU2 2
    But in the your patch contains 1 for CPU1 and 1 for CPU2.

    Below is the BL480c G1 Thermalinfo Array Responce
    <hpoa:getBladeThermalInfoArrayResponse>
    <hpoa:bladeThermalInfoArray>
    <hpoa:bladeThermalInfo>
    <hpoa:sensorNumber>5</hpoa:sensorNumber>
    <hpoa:sensorType>1</hpoa:sensorType>
    <hpoa:entityId>8</hpoa:entityId>
    <hpoa:entityInstance>1</hpoa:entityInstance>
    <hpoa:criticalThreshold>86</hpoa:criticalThreshold>
    <hpoa:cautionThreshold>81</hpoa:cautionThreshold>
    <hpoa:temperatureC>38</hpoa:temperatureC>
    <hpoa:oem>0</hpoa:oem>
    <hpoa:description>Memory Zone</hpoa:description>
    <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData>
    </hpoa:bladeThermalInfo>
    <hpoa:bladeThermalInfo>
    <hpoa:sensorNumber>6</hpoa:sensorNumber>
    <hpoa:sensorType>1</hpoa:sensorType>
    <hpoa:entityId>9</hpoa:entityId>
    <hpoa:entityInstance>3</hpoa:entityInstance>
    <hpoa:criticalThreshold>75</hpoa:criticalThreshold>
    <hpoa:cautionThreshold>70</hpoa:cautionThreshold>
    <hpoa:temperatureC>29</hpoa:temperatureC>
    <hpoa:oem>0</hpoa:oem>
    <hpoa:description>CPU Zone</hpoa:description>
    <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData>
    </hpoa:bladeThermalInfo>
    <hpoa:bladeThermalInfo>
    <hpoa:sensorNumber>7</hpoa:sensorNumber>
    <hpoa:sensorType>1</hpoa:sensorType>
    <hpoa:entityId>3</hpoa:entityId>
    <hpoa:entityInstance>1</hpoa:entityInstance>
    <hpoa:criticalThreshold>100</hpoa:criticalThreshold>
    <hpoa:cautionThreshold>95</hpoa:cautionThreshold>
    <hpoa:temperatureC>42</hpoa:temperatureC>
    <hpoa:oem>1</hpoa:oem>
    <hpoa:description>CPU 1</hpoa:description>
    <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData>
    </hpoa:bladeThermalInfo>
    <hpoa:bladeThermalInfo>
    <hpoa:sensorNumber>8</hpoa:sensorNumber>
    <hpoa:sensorType>1</hpoa:sensorType>
    <hpoa:entityId>3</hpoa:entityId>
    <hpoa:entityInstance>2</hpoa:entityInstance>
    <hpoa:criticalThreshold>100</hpoa:criticalThreshold>
    <hpoa:cautionThreshold>95</hpoa:cautionThreshold>
    <hpoa:temperatureC>41</hpoa:temperatureC>
    <hpoa:oem>1</hpoa:oem>
    <hpoa:description>CPU 1</hpoa:description>
    <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData>
    </hpoa:bladeThermalInfo>
    <hpoa:bladeThermalInfo>
    <hpoa:sensorNumber>9</hpoa:sensorNumber>
    <hpoa:sensorType>1</hpoa:sensorType>
    <hpoa:entityId>39</hpoa:entityId>
    <hpoa:entityInstance>1</hpoa:entityInstance>
    <hpoa:criticalThreshold>43</hpoa:criticalThreshold>
    <hpoa:cautionThreshold>38</hpoa:cautionThreshold>
    <hpoa:temperatureC>17</hpoa:temperatureC>
    <hpoa:oem>0</hpoa:oem>
    <hpoa:description>Ambient Zone</hpoa:description>
    <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData>
    </hpoa:bladeThermalInfo>
    <hpoa:bladeThermalInfo>
    <hpoa:sensorNumber>10</hpoa:sensorNumber>
    <hpoa:sensorType>1</hpoa:sensorType>
    <hpoa:entityId>3</hpoa:entityId>
    <hpoa:entityInstance>3</hpoa:entityInstance>
    <hpoa:criticalThreshold>100</hpoa:criticalThreshold>
    <hpoa:cautionThreshold>95</hpoa:cautionThreshold>
    <hpoa:temperatureC>42</hpoa:temperatureC>
    <hpoa:oem>2</hpoa:oem>
    <hpoa:description>CPU 2</hpoa:description>
    <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData>
    </hpoa:bl

    oa_soap: DBG: oa_soap_callsupport.c:751: OA response(2):
    adeThermalInfo>
    <hpoa:bladeThermalInfo>
    <hpoa:sensorNumber>11</hpoa:sensorNumber>
    <hpoa:sensorType>1</hpoa:sensorType>
    <hpoa:entityId>3</hpoa:entityId>
    <hpoa:entityInstance>4</hpoa:entityInstance>
    <hpoa:criticalThreshold>100</hpoa:criticalThreshold>
    <hpoa:cautionThreshold>95</hpoa:cautionThreshold>
    <hpoa:temperatureC>41</hpoa:temperatureC>
    <hpoa:oem>2</hpoa:oem>
    <hpoa:description>CPU 2</hpoa:description>
    <hpoa:extraData hpoa:name="SensorPresent">true</hpoa:extraData>
    </hpoa:bladeThermalInfo>
    </hpoa:bladeThermalInfoArray>
    </hpoa:getBladeThermalInfoArrayResponse>

     
  • Michele Baldessari

    Hi Hemantha,

    thanks for the review. Do you want me to respin a patch?

    regards,
    Michele

     
  • Hemantha Beecherla

    Hi Michele,
    Sure, Also Could you please perform quick testing on the updated patch.

    Thanks& regards,
    Hemantha Reddy

     
  • Michele Baldessari

    Hi Hemantha,

    please find the new patch attached with 2 CPU sensors for 480cg1. It's been running for a couple of hours so far on my enclosure without issues.

    regards,
    Michele

     
  • Michele Baldessari

    new patch with nr cpus of hp480g1 fixed

     
  • Hemantha Beecherla

    Thank you Michele, I will check in this patch today.

    Thanks& Regards,
    Hemantha Reddy

     
  • Hemantha Beecherla

    • milestone: --> 3.3.x
    • labels: 622325 --> HP c-Class Plugin
    • status: open --> closed-fixed
     
  • Hemantha Beecherla

    Fixed in trunk Rev: 7511.

     
  • dr_mohan

    dr_mohan - 2013-06-20
    • Group: 3.3.x --> 3.2.1
     
  • Tariq Shureih

    Tariq Shureih - 2013-06-20

    *ATTENTION**
    This account is disabled and is no longer accessed by the recipient.
    Please remove it from your address book.

    Thanks

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks