Menu

#139 Suspend freezes PC with kernel 6.10

open
nobody
None
unassigned
default
2024-08-27
2024-08-16
Grundik
No

Sometimes, but more often when not, suspend renders system completely stuck: it cant go into sleep, but cant abort suspend either. No cd images are loaded in the virtual drive. It started with 6.10.4 kernel. Removing vhba module seemingly solves the issue (I'll update ticket if somehow problem reoccurs without it).

Kernel log shows this:

kernel: PM: suspend entry (s2idle)
kernel: Filesystems sync: 0.018 seconds
kernel: Freezing user space processes
kernel: Freezing user space processes failed after 20.006 seconds (1 tasks refusing to freeze, wq_busy=0):
kernel: task:pool-gcdemu     state:D stack:0     pid:404750 tgid:3258  ppid:2642   flags:0x00004006
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x3eb/0xb40
kernel:  schedule+0x27/0xf0
kernel:  request_wait_answer+0xd0/0x2a0
kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
kernel:  fuse_simple_request+0x17e/0x2c0
kernel:  fuse_statfs+0xf2/0x160
kernel:  statfs_by_dentry+0x64/0x90
kernel:  user_statfs+0x6b/0xd0
kernel:  __do_sys_statfs+0x35/0x70
kernel:  do_syscall_64+0x82/0x190
kernel:  ? select_task_rq_fair+0x1d0/0x1720
kernel:  ? do_futex+0x125/0x190
kernel:  ? __x64_sys_futex+0x129/0x1e0
kernel:  ? futex_wait+0x89/0x120
kernel:  ? sched_clock+0x10/0x30
kernel:  ? sched_clock_cpu+0xf/0x190
kernel:  ? __smp_call_single_queue+0xab/0x110
kernel:  ? ttwu_queue_wakelist+0xd0/0xf0
kernel:  ? try_to_wake_up+0x211/0x5f0
kernel:  ? wake_up_q+0x4e/0x90
kernel:  ? futex_wake+0x159/0x190
kernel:  ? do_futex+0x125/0x190
kernel:  ? __x64_sys_futex+0x129/0x1e0
kernel:  ? syscall_exit_to_user_mode+0x77/0x210
kernel:  ? do_syscall_64+0x8e/0x190
kernel:  ? syscall_exit_to_user_mode+0x77/0x210
kernel:  ? do_syscall_64+0x8e/0x190
kernel:  ? switch_fpu_return+0x4f/0xd0
kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
kernel: RIP: 0033:0x7f6d5baf93c7
kernel: RSP: 002b:00007f6d35fffcb8 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
kernel: RAX: ffffffffffffffda RBX: 00000000332a11a0 RCX: 00007f6d5baf93c7
kernel: RDX: 000000003344f3f0 RSI: 00007f6d35fffcf0 RDI: 00000000332b38c0
kernel: RBP: 00007f6d35fffe50 R08: 000000003306fdd0 R09: 0000000033072698
kernel: R10: aaaaaaaaaaaaaaab R11: 0000000000000206 R12: 000000003344ba00
kernel: R13: 0000000000000000 R14: 00007f6d49bffbf0 R15: 00007f6d35800000
kernel:  </TASK>

Im using debian unstable with debian-multimedia repo for cdemu.

Kernel: 6.10.4-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.4-1 (2024-08-12) x86_64 GNU/Linux
vhba-dkms version 1:20240202-dmo1
gcdemu version 1:3.2.6-dmo3

Discussion

  • Rok Mandeljc

    Rok Mandeljc - 2024-08-17

    We've had couple of issue reports over years with suspend/hibernate (https://sourceforge.net/p/cdemu/bugs/91, https://github.com/cdemu/cdemu/issues/11), and I've seen couple of threads around the internet where vhba and cdemu were identified as culprit. I haven't been able to reproduce these locally, though.

    Suspend on my Fedora 40 notebook (6.10.4-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC) seems to work as expected.

    Although with caveat that the default suspend method here seems to be deep, while s2idle, which you appear to be using, does not seem to work on this notebook (regardless of whether cdemu is installed or not).

    Similarly, I do not have hibernate partition/file configured, so I did not test hibernate.

    The other reports that I mentioned seemed to imply that the issue is with vhba module itself (e.g., lack of power-save features in the code that would suspend virtual device's workqueue).

    In your case, though, the kernel: task:pool-gcdemu ... line seems to imply that the issue is on userspace side, in gCDEmu. Which is odd, because gCDEmu is just a GUI client, and should not be un-freezable. So I guess for starters, can you confirm or deny that? Do you get the same issue if you don't use gCDEmu, but instead ensure that cdemu-daemon is running by for example using the CLI client to run cdemu status? If this also fails, what is the error in this case?

     

    Last edit: Rok Mandeljc 2024-08-17
  • Grundik

    Grundik - 2024-08-17

    Its very weird issue, I agree. It probably occurred sometimes before, but I cant confirm it was issue in vhba. System occasionally just freezes going to standby, and Im using s3idle. After kernel update from 6.10.0 to 6.10.4 it started to occur quite often, like once a day. And thats the only lead I had: one time system do not get totally frozen, I was able to switch to console and see the logs, and that was there. I tried to remove vhba+cdemu, and there already 3 days without issues. Obviously its not a sure confirmation, but I dont know that to think. Issue is indeed not easily reproducible.

    I'll watch closer, and try different options like removing only gcdemu, but it will take time to be sure. I'll update this issue once I get some clarity. Thanks for your response.

     
  • Grundik

    Grundik - 2024-08-18

    I'm not sure, if this is related to vhba, but after I installed it back into system, I've managed to get this freeze (it was recoverable, unlike previously):

    kernel: PM: suspend entry (s2idle)
    kernel: Filesystems sync: 0.029 seconds
    kernel: Freezing user space processes
    kernel: Freezing user space processes completed (elapsed 0.004 seconds)
    kernel: OOM killer disabled.
    kernel: Freezing remaining freezable tasks failed after 20.009 seconds (0 tasks refusing to freeze, wq_busy=1):
    kernel: Showing freezable workqueues that are still busy:
    kernel: workqueue events_freezable: flags=0x4
    kernel:   pwq 10: cpus=2 node=0 flags=0x0 nice=0 active=0 refcnt=2
    kernel:     inactive: pci_pme_list_scan
    kernel: workqueue events_freezable_pwr_efficient: flags=0x86
    kernel:   pwq 64: cpus=0-15 flags=0x4 nice=0 active=0 refcnt=3
    kernel:     inactive: 2*disk_events_workfn
    kernel:   pwq 64: cpus=0-15 flags=0x4 nice=0 active=2 refcnt=3
    kernel:     in-flight: 176436:disk_events_workfn ,176450:disk_events_workfn
    kernel: workqueue pm: flags=0x4
    kernel:   pwq 34: cpus=8 node=0 flags=0x0 nice=0 active=0 refcnt=2
    kernel:     inactive: pm_runtime_work
    kernel:   pwq 50: cpus=12 node=0 flags=0x0 nice=0 active=0 refcnt=2
    kernel:     inactive: pm_runtime_work
    

    I know, it says s2idle, but laptop is actually going into deep suspend-to-ram state, probably its doing that through some fancy s0ix state.

     
  • Rok Mandeljc

    Rok Mandeljc - 2024-08-18

    That does look more in line with other reports.

    To clarify - you do not only have vhba module loaded, but also have at least one cdemu-daemon instance running, and at least one virtual device created? Or not (i.e, just vhba module loaded, and nothing else).

    Also, do you have any other optical drives (internal or external) connected to the system?

    The number of in-flight number reported fordisk_events_workfn seems suspiciously high. I think this is related to kernel's continuous probing for device status and events (such as media change). And now that I think about it some more, during suspend/hibernate, user processes are frozen first. So cdemu-daemon cannot receive and process any new or outstanding commands that might be still queued in the kernel (for example, as result of that events probing task). Could be that in this scenario, either the vbha module itself or the probing task goes haywrite... I suppose once I find some more time, I'll try to simulate/induce a command processing freeze (while keeping control device file handle open) in the daemon to see what happens on the kernel side.

     
  • Grundik

    Grundik - 2024-08-18

    VHBA module, and cdemu kde daemon. 2 virtual drives, but no images loaded in them. No extra optical drives in system.

    I'll disable daemon, and report back in whats the result.

     
  • Grundik

    Grundik - 2024-08-18

    P.S. clarification: systemd cdemu daemon and cdemu kde manager. I'll disable both of them.

     
  • Grundik

    Grundik - 2024-08-27

    More than week have passed, vhba module is loaded, cdemu daemon and manager are not (but was loaded once, to talk to the module initially) — not a single hiccup occured.

     
  • Grundik

    Grundik - 2024-08-27

    I've noticed, that probability of failure increased, if memory-heavy applications were loaded, and swap space intensively used, even if they are already unloaded at the time of suspend. Maybe something critical could get swapped out?..

     

Log in to post a comment.