#37 wrong temp read on BL460c G6 servers

open
None
5
2012-09-10
2011-11-02
Jaroslaw Gorny
No

During boot, g4l causes wrong temperature read of 'Ambient' sensor. Instead of actual temperature, value 37°C is read. As a result FANs are running with high speed continuously. Problem remains even after "normal" system start-up. The only solution is to remove / push the blade from the enclosure, or do the:

oa1> reset server <num>

which is equal to physical pull / push of the server from the enclosure.

Please see the excerpt from the session:

oa1> show server temp 8

Device Bay #8 Temperature Information
Locale Status Temp Caution Critical
----------------------------------- ------ -------- ------- --------
Ambient Zone OK 37C/ 98F 42C 46C
(...)
Virtual Fan: 84%

and after:

oa1> reset server 8

WARNING: Resetting the server trips its E-Fuse. This causes all power to be momentarily removed from the server. This command should only be used when physical access to the ser
ver is unavailable, and the server must be removed and
reinserted.

Any disk operations on direct attached storage devices will be affected. I/O
will be interrupted on any direct attached I/O devices.

Entering anything other than 'YES' will result in the command not executing.

Do you want to continue ? YES

Successfully reset the E-Fuse for device bay 8.

oa1> show server temp 8

Device Bay #8 Temperature Information
Locale Status Temp Caution Critical
----------------------------------- ------ -------- ------- --------
Ambient Zone OK 22C/ 71F 42C 46C
(...)
Virtual Fan: 19%

Discussion

  • This would probable be some kernel option. The kernels are build from the kernel.org source code with the configuration files that are in the root of the cd image. I don't know if the kernel doesn't have the support for the chipset this system has or why it would cause that kind of issue.

    What does dmesg report about finding a temperature setting. I don't know if removing all temperature options from kernel would be best, or if there is something that isn't there that needs to be added.

     
  • Jaroslaw Gorny
    Jaroslaw Gorny
    2011-11-07

    Sorry for delay on my side. Tomorrow I will have the possibility to reboot servers in 2 different enclosures. I will collect the informations you request in both of the bugs I've reported (this one, and the one about /dev/cciss being wrongly detected).

     
  • Thanks. I would like to confirm exactly what you see.
    I have been able to talk with the fsarchiver author, and he sent me some info.
    I have made a couple new versions, that use the fsarchiver probe -v, and thus pull the full name.
    It also adds the uuid to the listing to better help people select the correct partition.
    ftp://amd64gcc.dyndns.org/g4l0.39alpha/g4l-v0.39alpha17.iso

     
  • I sent a message to HP, and got a quick response, but don't have the info to follow up, since I don't know much more about the system.

    Thank you for contacting Hewlett-Packard.

    This is in response to your e-mail regarding the HP ProLiant BL460c G7
    Server.

    Dear Michael,

    This e-mail is in reference to the Case: 16909723 logged about the HP
    Proliant BL460c Server.

    As per your e-mail, whenever you boot the HP Proliant Blade BL460c Server
    with the Linux CD, fans run at higher speed. Then, the user would need to
    remove and reseat the Blade server to get it working normally.

    Usually, issue regarding fans running at higher speed happens because of
    out-dated Firmware / BIOS and drivers on the server and the Blade
    Enclosure. To ascertain the exact problem and assist you further, please
    provide us the following information:

    • Which is the enclosure in which the BL460c server is installed? (C3000
      or C7000) - You mentioned that the fans on BL460c run at higher speed
      when the server is booted from a Linux disc. Are you referring to the
      fans in the enclosure? - Does the issue happen when the server is booted
      from the Linux disc in particular or is it with any disc? - Is the issue
      happening only with this particular server or multiple Blade Servers in
      the Enclosure? - Although the fans run at higher speed, do you find any
      specific error messages reported on the server? Also, do you find any
      other issues with the server? - When did the issue start to occur? Was
      there any hardware/software change on the server? or is it an Out Of Box
      server? - Let us know the BIOS, iLO and PMC version on the server along
      with the OA (Onboard Administrator) firmware version.

    If possible, please include the “Showall” report from the OA which
    should give us more information. To save the OA “Showall” report:
    Login to the Onboard Administrator -> Go to enclosure settings ->
    Configuration scripts -> Click on the second link “Click Here for
    Inventory” and save it as .txt file. Or login via telnet/ssh and run the
    command showall and export to a text file. NOTE: You must have
    Administrator credentials to get this report

    Along with this, please provide the below mentioned information:

    Serial No of the server:
    Company Name:
    Company address: (where the server is located)
    Onsite Contact: (at the server location)
    Phone no:
    Alternate Phone no: (if available)
    E-mail address:

    In-case you have any further queries; please feel free to contact us back
    referring to the incident ID or case numbers. We will be glad to assist
    you further.

    Thank you for contacting HP E-Solutions Technical Support team. Have a
    great day!

    With Warm Regards,

    Pandu Vasisht
    Technical Consultant
    HP TSG, Global Solution Center, Bangalore

    Support:

    For US, Canada & Caribbean: 1-800-633-3600 ‘server’, ‘operating
    system(Microsoft/Linux)’ and then ‘type of hardware
    (Proliant/Blade/Integrity)’ For UK & Ireland: 0845 161 0050 option1+
    option3 + option1

    Need Instant Support??
    For Chat Support: www.hp.com/go/hpchat
    For Email Support: www.itrc.hp.com/

    This e-mail, and any files transmitted with it are HP Confidential and
    intended solely for the use of the individual or entity to whom it is
    addressed. If you have received this e-mail in error, please discard the
    message and notify me directly.

    Sincerely,

    HP Email Support

    Please include the following identifier in the subject line of all future
    correspondence relating to this ticket.

    Incident ID: <16909723>>

     
  • Jaroslaw Gorny
    Jaroslaw Gorny
    2011-12-05

    I've just added a 'show all' dump from one of the affected enclosures. Please free to forward it to HP.

     
  • Jaroslaw Gorny
    Jaroslaw Gorny
    2011-12-05

    Usually, issue regarding fans running at higher speed happens because of
    out-dated Firmware / BIOS and drivers on the server and the Blade
    Enclosure.

    I've reproduced this issue on couple of Enclosures in different environments across the World.
    I don't expect they are all running the same Firmware version.
    However, please send me the information what is the Firmware version you advise to fix this issue.
    Please, also provide a changelog, which shows this issue is addressed by a particular firmware release.

    • Which is the enclosure in which the BL460c server is installed? (C3000
      or C7000)

    It is C7000

    • You mentioned that the fans on BL460c run at higher speed
      when the server is booted from a Linux disc. Are you referring to the
      fans in the enclosure?

    Yes. Please refer to the information I placed in the bug description:
    https://sourceforge.net/support/tracker.php?aid=3432274

    You will find all the information there. This is the Inlet FAN.

    • Does the issue happen when the server is booted
      from the Linux disc in particular or is it with any disc?

    Again, see the description. The issue can be reproduced with G4L iso. Tested versions: 0.30, 0.31, 0.38.

    • Is the issue happening only with this particular server or multiple
      Blade Servers in the Enclosure?

    All the G6 servers in the Enclosure.

     
  • I sent the link to the set of messages to the HP rep that responded quickly to my email, and he said he would monitor it.
    Have you run dmesg to see if it shows anything that might point to what is causing the problem?
    dmesg | more
    or
    dmesg >dmesg.out
    Then ftp the file to another machine and post it.

    Also, one of the response said something about having the latest firmware loaded.

    I'm think it has something to do with the kernel probing for various hardware, and this might be confusing the sensor in some way.