GPU Temp in NVClock differs from other tools

  • xfs


    I have a Nvidia 8800 GTS (320MB), using bios: (if this helps).

    It's like this:
    I have written a small shell-script to monitor the GPU-temp and Fan-speed, and this script updates the GPU-Temp and Fan-speed in a X-Term-window starting when I start X11. The update is done every 10'th secund, and if the temperature rises the fanspeed rises as well. But the temperatures read from NVClock differs alot from the temperatures read from Nvidia-settings, gkrellm and other tools. The temperature NVClock reads out is often 10 degrees lower then other tools tell me (usually around 40-45 when 'idle', when other tools show me 51-65). It's not always 10 degrees, sometimes it's 7, sometimes it's 12.

    This is no big problem, adjust the fanspeed according from NVClock-readings is not the deal. (The onboard sensors isn't always trustable).
    The problem is that the other tools (always showing the same temps) shows higher GPU-temp values instantly when for example starting some glxgears-processes. NVClock takes some time (sometimes up to a minute or more) before the temp start to rise.

    This leaving me to the following choices:
    * Set the Fan-speed high from the beginning (80% or higher) to not overheat the GPU when puts load on it. <-- This solution I use right now.
    * Find another console-program to use to read the temps from and then use nvclock to set the fan. <-- Havn't found any program usable

    If the temp-readings got higher instantly this wouldn't be any problem since I am not interrested in the temp exactly, I am just interrested in keeping the temps low by speeding up the fan when needed. Manual fancontrol is not funny. The fan on this card is VERY noisy!

    Including some card-information and nvclock-output in the bottom of the post.

    ALSO I am requesting a feature to this nice application that I really miss right now:
    A daemon!
    Right now I start "nvclock -i" every time to read fanspeeds and temps. Would be nice if the applications included a feature to not shut down, but to read the temps and print them in stdout every secund or so. By this a shell-script could read the output and adjust the temps by the results very easy.

    The version of NVClock I use is the latest from CVS right now.

    The output from nvclock -i:
    <cut here>

    -- General info --
    Card:           nVidia Geforce 8800GTS
    Architecture:   NV50/G80 A3
    PCI id:         0x193
    GPU clock:      513.000 MHz
    Bustype:        PCI-Express

    -- Shader info --
    Clock: 1188.000 MHz
    Stream units: 96 (01111110b)
    ROP units: 20 (111101b)
    -- Memory info --
    Amount:         320 MB
    Type:           320 bit DDR3
    Clock:          792.000 MHz

    -- PCI-Express info --
    Current Rate:   16X
    Maximum rate:   16X

    -- Sensor info --
    Sensor: Analog Devices ADT7473
    Board temperature: 41C
    GPU temperature: 43C
    Fanspeed: 1920 RPM
    Fanspeed mode: auto
    PWM duty cycle: 69.8%

    -- VideoBios information --
    Signon message: GeForce 8800 GTS VGA BIOS
    Performance level 0: gpu 513MHz/shader 1188MHz/memory 792MHz/1.30V/100%
    VID mask: 3
    Voltage level 0: 1.10V, VID: 0
    Voltage level 1: 1.20V, VID: 1
    Voltage level 2: 1.30V, VID: 2

    </cut here>

    and the output from nvclock -D:

    <cut here>

    --- nVidia Geforce 8800GTS GPU registers ---
    NV_PMC_BOOT_0 (0x0): 450300a3
    NV_PBUS_DEBUG_0 (0x1080): 00000000
    NV_PBUS_DEBUG_1 (0x1084): 00041c69
    NV_PBUS_DEBUG_2 (0x1088): 00000000
    NV_PBUS_DEBUG_3 (0x108c): 000000d1
    NV_10F0 (0x10f0): 00000000
    NV_1540 (0x1540): f33d007e
    NV_15B0 (0x15b0): 00000000
    NV_15B4 (0x15b4): 00000000
    NV_15B8 (0x15b8): 00000000
    NV_15F0 (0x15f0): 00000000
    NV_15F4 (0x15f4): 00000000
    NV_15F8 (0x15f8): 00000000
    NV_PBUS_PCI_0 (0x1800): 019310de
    NV_C010 (0xc010): 00000000
    NV_C014 (0xc014): 00000000
    NV_C018 (0xc018): 00000000
    NV_C01C (0xc01c): 00000000
    NV_C020 (0xc020): 00000000
    NV_C024 (0xc024): 00000000
    NV_C028 (0xc028): 00000000
    NV_C02C (0xc02c): 00000000
    NV_C040 (0xc040): 3000c0b3
    NV_4000 (0x4000): 80000000
    NV_4004 (0x4004): 00002008
    NV_4008 (0x4008): 8018e400
    NV_400C (0x400c): 00001603
    NV_4010 (0x4010): 80016400
    NV_4014 (0x4014): 00001603
    NV_4018 (0x4018): 80016400
    NV_401C (0x401c): 00001603
    NV_4020 (0x4020): 80000000
    NV_4024 (0x4024): 00001602
    NV_4028 (0x4028): a0000000
    NV_402C (0x402c): 00001304
    NV_4030 (0x4030): 00000000
    NV_4034 (0x4034): 00000000
    NV_4038 (0x4038): 00000000
    NV_403C (0x403c): 00000000
    NV_4040 (0x4040): 00000000
    NV_4044 (0x4044): 00000000
    NV_4048 (0x4048): 00000000
    NV_404C (0x404c): 00000000
    NV_4050 (0x4050): 00000000
    NV_4054 (0x4054): 00000000
    NV_4058 (0x4058): 00000000
    NV_405C (0x405c): 00000000
    NV_4060 (0x4060): 00000000
    NV_E100 (0xe100): 00002998
    NV_E11C (0xe11c): 00000001
    NV_E120 (0xe120): 00000000
    NV_20008 (0x20008): 00000000
    NV_PFB_CFG0 (0x100200): 00001030
    NV_PFB_CFG0 (0x100204): 00449000
    NV_PFB_CFG0 (0x100208): 00000000
    NV_PFB_CFG0 (0x10020c): 14000000
    NV_PFB_218  (0x100218): 01000101
    NV_PFB_TIMING0 (0x100220): 0a192d23
    NV_PFB_TIMING1 (0x100224): 0d01080d
    NV_PFB_TIMING2 (0x100228): 0008080c
    NV_PFB_474     (0x100474): 00000000
    NV_PEXTDEV_BOOT_0 (0x101000): 8f488c9e
    NV_NVPLL_COEFF_A (0x680500): 00000000
    NV_MPLL_COEFF_A (0x680504): 00000000
    NV_VPLL_COEFF (0x680508): 00000000
    NV_PLL_COEFF_SELECT (0x68050c): 00000000
    NV_NVPLL_COEFF_B (0x680570: 00000000
    NV_MPLL_COEFF_B (0x680574: 00000000

    </cut here>

    • First about the temperature differences. Starting from the Geforce 6200 (NV43) all Nvidia GPUs ship with an temperature   sensor builtin into the GPU. This sensor is very basic and in my opinion not very accurate. Most midend and highend Geforce6 boards use an external 'LM99'   sensor connected to the I2C bus of the card. This sensor also reads the internal   GPU temperature and it also shows the board temperature (the temperature of the sensor  chip itself). Further something similar is the case for Geforce7 and Geforce8 cards. For   these most highend models (Geforce 7800/7900/8800) contain an ADT7473 sensor.

      When an external sensor chip is present NVClock reads it out. Nvidia-settings in general only reads out the internal GPU sensor (this is the case when only a single temperature is shown). An exception to this rule are some NV43 based Geforce6 cards which feature both an internal and external chip and in case nvidia-settings reads out the external chip.

      Personally I don't trust the internal GPU sensor much (it is basically a diode with a AD-converter behind it and the temperature is interpolated like this: temperature = offset + ADC_value * conversion_factor; a diode its temperature dependence is not linear and I don't think nvidia compensates for this at all). One of the reasons I don't trust the internal sensor at all is that it can report insanely high temperature like 90-100C and higher which is VERY unrealistic for silicon chips. Something around 70C is much more realistic.

      The temperature read from external sensors is read in the way the vendor of the sensor has documented their chips. We should trust the values returned from the ADT7473 and friends. The problem is though that at least in case of LM99 sensors which in some cases were also read out by the nvidia drivers, nvidia hackily added some offset to it in software (normally the offset is programmed into the sensor). Perhaps something like this is going on for the ADT7473 which should then require some offset but I'm not sure about this. There is sensor info in the bios but I don't know what everything means.

      Further regarding the ADT7473 it is an autonomous temperature sensor AND fanspeed controller. It is programmed from your video bios and should correctly adjust the fanspeed, so no manual intervention should be needed. In driver 169.07 the nvidia drivers are messing with the sensor. You could use 'nvclock -f -F auto' to put the sensor back in automatic mode.

      I have been working on daemon-like stuff for a while. Perhaps I'll take another look at it once stuff is more stable.

    • Note that basically all other tools that show the temperature use 'nvidia-settings' (to be exact the nv-control api from the nvidia drivers).

    • mathieu

      Here is a patch to nvclock to fix the temperature issue with 8800 cards (G92). There is a compiled version for Ubuntu too.

      I tried to contact Roderick, but it seems the e-mail address in nvclock package doesn't work.

    • Note that most highend 8800 cards use the ADT7473 and most basic cards like yours use the internal sensor. The internal sensor doesn't require this. It has a different offset.