Hi,
On Fri, Aug 22, 2025, at 7:54 AM, Armin Wolf wrote:
> Am 21.08.25 um 19:32 schrieb Marc Burkhardt:
>
>> Am 2025-08-20 00:03, schrieb Mark Pearson:
>>
>> Hi Mark,
>>
>> thanks for replying.
>>
>>> Hi Marc,
>>>
>>> On Mon, Aug 18, 2025, at 4:39 PM, Marc Burkhardt wrote:
>>>> While moving an existing Icinga installation to a Lenovo P15 20SU I
>>>> came
>>>> across broken JSON output from a "sensors -Aj" command consumed by the
>>>> Icinga check_lm_sensors plugin. After fiddling around trying to build a
>>>> fix in either lm_sensors or Icinga I found out the error was rooted in
>>>> some sysfs file that was created but threw errors while being read.
>>>> On my
>>>> Lenovo ThinkPad the default fallback to 8 temperature sensors creates
>>>> sysfs entries like in my case "temp8_input" that fail when read,
>>>> causing
>>>> the issue in user-space.
>>>>
>>>> This patch adds a module parameter (suppress_sensor) using
>>>> module_param_array() to allow users to specify a comma-separated
>>>> list of
>>>> zero-based sensor indices to suppress sysfs file creation (e.g.
>>>> suppress_sensor=3,7). Instead of a model-specific quirk, this provides
>>>> flexible configuration for any ThinkPad with similar issues and is
>>>> working
>>>> out-of-the-box without additional models being marked for the quirk.
>>>> The
>>>> parameter uses a fixed-size array based on
>>>> TPACPI_MAX_THERMAL_SENSORS (16)
>>>> consistent with the driver’s thermal sensor handling (ie.
>>>> ibm_thermal_sensors_struct or sensor_dev_attr_thermal_temp_input).
>>>>
>>>> Logging via pr_info() reports the number of suppressed sensors at
>>>> module
>>>> initialization, and pr_info() logs each suppressed sensor during sysfs
>>>> attribute creation. Invalid sensor indices are logged once via
>>>> pr_warn()
>>>> to avoid repetitive warnings. Tested on a ThinkPad P15 with
>>>> suppress_sensor=3,7, confirming suppression of temp4_input and
>>>> temp8_input
>>>> with no sysfs errors. Bounds checking for uncommon values is in
>>>> place or
>>>> will be logged.
>>>>
>>>> The patch applies to the current
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>>> although
>>>> it was initially written for a 6.16.0 kernel.
>>>>
>>>> I look forward to any feedback on the patch and/or handling of
>>>> submission.
>>>> Please CC: for now as I am not (yet) subscribed. Thank you.
>>>>
>>>> Signed-off-by: Marc Burkhardt <mar...@pr...nsulting>
>>>> ---
>>>> Notes:
>>>> I haven't posted on LKML or send a patch for over a decade now so
>>>> please forgive any possible mistakes I made regarding current coding
>>>> conventions or more generally in submitting this patch. The patch was
>>>> running for some time here with faulty sensors removed from sysfs
>>>> and no
>>>> problems otherwise detected and was surely run through checkpatch.pl
>>>> before
>>>> submission. get_maintainer.pl was helpful to find the hopefully right
>>>> people for CC:ing but I am otherweise totally unaware of any current
>>>> procedures or best-practices when it comes to submitting a patch.
>>>>
>>>> drivers/platform/x86/lenovo/thinkpad_acpi.c | 35
>>>> +++++++++++++++++++++++++++++
>>>> 1 file changed, 35 insertions(+)
>>>>
>>>> diff --git a/drivers/platform/x86/lenovo/thinkpad_acpi.c
>>>> b/drivers/platform/x86/lenovo/thinkpad_acpi.c
>>>> index cc19fe520ea9..30ff01f87403 100644
>>>> --- a/drivers/platform/x86/lenovo/thinkpad_acpi.c
>>>> +++ b/drivers/platform/x86/lenovo/thinkpad_acpi.c
>>>> @@ -6019,6 +6019,30 @@ struct ibm_thermal_sensors_struct {
>>>> s32 temp[TPACPI_MAX_THERMAL_SENSORS];
>>>> };
>>>>
>>>> +static int suppress_sensor[TPACPI_MAX_THERMAL_SENSORS];
>>>> +static unsigned int suppress_sensor_count;
>>>> +
>>>> +static bool is_sensor_suppressed(int index)
>>>> +{
>>>> + unsigned int i;
>>>> + bool logged = false;
>>>> +
>>>> + for (i = 0; i < suppress_sensor_count; i++) {
>>>> + if (suppress_sensor[i] == index)
>>>> + return true;
>>>> +
>>>> + if (!logged &&
>>>> + (suppress_sensor[i] < 0
>>>> + || suppress_sensor[i] >=
>>>> TPACPI_MAX_THERMAL_SENSORS)) {
>>>> + pr_warn("Invalid sensor index %d in suppress_sensor\n",
>>>> + suppress_sensor[i]);
>>>> + logged = true;
>>>> + }
>>>> + }
>>>> +
>>>> + return false;
>>>> +}
>>>> +
>>>> static const struct tpacpi_quirk thermal_quirk_table[] __initconst = {
>>>> /* Non-standard address for thermal registers on some ThinkPads */
>>>> TPACPI_Q_LNV3('R', '1', 'F', true), /* L13 Yoga Gen 2 */
>>>> @@ -6313,6 +6337,11 @@ static umode_t thermal_attr_is_visible(struct
>>>> kobject *kobj,
>>>>
>>>> int idx = sensor_attr->index;
>>>>
>>>> + if (is_sensor_suppressed(idx)) {
>>>> + pr_info("Sensor %d suppressed\n", idx);
>>>> + return 0;
>>>> + }
>>>> +
>>>> switch (thermal_read_mode) {
>>>> case TPACPI_THERMAL_NONE:
>>>> return 0;
>>>> @@ -11653,6 +11682,9 @@ static void __init
>>>> thinkpad_acpi_init_banner(void)
>>>> thinkpad_id.model_str,
>>>> (thinkpad_id.nummodel_str) ?
>>>> thinkpad_id.nummodel_str : "unknown");
>>>> +
>>>> + pr_info("Suppressing %d user-supplied sensor(s) via parameter
>>>> suppress_sensor\n",
>>>> + suppress_sensor_count);
>>>> }
>>>>
>>>> /* Module init, exit, parameters */
>>>> @@ -11785,6 +11817,9 @@ MODULE_PARM_DESC(experimental,
>>>> module_param_named(debug, dbg_level, uint, 0);
>>>> MODULE_PARM_DESC(debug, "Sets debug level bit-mask");
>>>>
>>>> +module_param_array(suppress_sensor, int, &suppress_sensor_count,
>>>> 0444);
>>>> +MODULE_PARM_DESC(suppress_sensor, "Comma-separated sensor indices to
>>>> suppress (e.g., 3,7)");
>>>> +
>>>> module_param(force_load, bool, 0444);
>>>> MODULE_PARM_DESC(force_load,
>>>> "Attempts to load the driver even on a mis-identified
>>>> ThinkPad when
>>>> true");
>>>
>>> The P15 is one of the Linux certified platforms...though it's a bit
>>> older now.
>>>
>>> I'd be more interested in figuring out which sensors are returning an
>>> error and figuring out how we address that. I have access to the FW
>>> and platform team for questions (though this platform is a bit older
>>> now, so if we need FW fixes that will be trickier). My gut feeling is
>>> we shouldn't be creating sysfs entries if the sensors don't exist or
>>> aren't accessible.
>>
>> That is what my patch does - it prevents creating the sysfs entries
>> but not based on a check for validity of the sensor in code (as
>> probably desired by Ilpo when I understand a previous mail correctly)
>> but rather on a user-provided configuration via the new parameter. I
>> reply to the other mail as well soon.
>>
> Such sensors are meant to be ignored using /etc/sensors3.conf (provided
> by libsensors) unless the driver itself can
> automatically determine this by asking the platform firmware. I suggest
> that you use this mechanism instead of adding
> additional module parameters.
>
> Thanks,
> Armin Wolf
>
> (I also CCed the hwmon mailing list as libsensors originally came from there)
>
>>>
>>> I do have a P15 so can check it out (I'm going to have to blow some
>>> dust off it). If you've got the details on which sensors need
>>> suppressing that would be useful. I have seen previously where it's
>>> trying to access a GPU sensor on a UMA model.
>>
>> On my hardware it's sensor temp8_input which is unreadable at all und
>> sensor temp4_input that has a constant value of 0, no matter how hot,
>> cold or loud the machine is running. I am, however, able to monitor
>> GPU temps via nvidia _and_ thinkpad ACPI. The values are mostly equal,
>> differ a bit due to internal timing sometimes.
>>
>>>
I tried this on my P15, and I do get an error when the GPU sensor is accessed as it's not available (no Nvidia card on mine).
A suggestion (based a bit on Armin's suggestions): If the is_visible function is changed so if the sensor returns an error (or not available) then the sysfs entry isn't displayed.
I think that would prevent errors/access issues from user space - at least it worked on my system.
Something like the below (I can send this up as a proper patch if it makes sense)
diff --git a/drivers/platform/x86/lenovo/thinkpad_acpi.c b/drivers/platform/x86/lenovo/thinkpad_acpi.c
index cc19fe520ea9..075d15df183c 100644
--- a/drivers/platform/x86/lenovo/thinkpad_acpi.c
+++ b/drivers/platform/x86/lenovo/thinkpad_acpi.c
@@ -6312,6 +6312,8 @@ static umode_t thermal_attr_is_visible(struct kobject *kobj,
to_sensor_dev_attr(dev_attr);
int idx = sensor_attr->index;
+ s32 value;
+ int res;
switch (thermal_read_mode) {
case TPACPI_THERMAL_NONE:
@@ -6334,6 +6336,11 @@ static umode_t thermal_attr_is_visible(struct kobject *kobj,
}
+ /* Check if sensor is available */
+ res = thermal_get_sensor(idx, &value);
+ if ((res) || (value == TPACPI_THERMAL_SENSOR_NA))
+ return 0;
+
return attr->mode;
}
I think this would generally be useful for removing unwanted sensors without having to do extra steps?
Mark
|