CDemu - a virtual CD/DVD drive for Linux / Bug reports / #139 Suspend freezes PC with kernel 6.10

Rok Mandeljc - 2024-08-17

We've had couple of issue reports over years with suspend/hibernate (https://sourceforge.net/p/cdemu/bugs/91, https://github.com/cdemu/cdemu/issues/11), and I've seen couple of threads around the internet where vhba and cdemu were identified as culprit. I haven't been able to reproduce these locally, though.

Suspend on my Fedora 40 notebook (6.10.4-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC) seems to work as expected.

Although with caveat that the default suspend method here seems to be deep, while s2idle, which you appear to be using, does not seem to work on this notebook (regardless of whether cdemu is installed or not).

Similarly, I do not have hibernate partition/file configured, so I did not test hibernate.

The other reports that I mentioned seemed to imply that the issue is with vhba module itself (e.g., lack of power-save features in the code that would suspend virtual device's workqueue).

In your case, though, the kernel: task:pool-gcdemu ... line seems to imply that the issue is on userspace side, in gCDEmu. Which is odd, because gCDEmu is just a GUI client, and should not be un-freezable. So I guess for starters, can you confirm or deny that? Do you get the same issue if you don't use gCDEmu, but instead ensure that cdemu-daemon is running by for example using the CLI client to run cdemu status? If this also fails, what is the error in this case?

Last edit: Rok Mandeljc 2024-08-17

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Grundik - 2024-08-17

Its very weird issue, I agree. It probably occurred sometimes before, but I cant confirm it was issue in vhba. System occasionally just freezes going to standby, and Im using s3idle. After kernel update from 6.10.0 to 6.10.4 it started to occur quite often, like once a day. And thats the only lead I had: one time system do not get totally frozen, I was able to switch to console and see the logs, and that was there. I tried to remove vhba+cdemu, and there already 3 days without issues. Obviously its not a sure confirmation, but I dont know that to think. Issue is indeed not easily reproducible.

I'll watch closer, and try different options like removing only gcdemu, but it will take time to be sure. I'll update this issue once I get some clarity. Thanks for your response.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

I'm not sure, if this is related to vhba, but after I installed it back into system, I've managed to get this freeze (it was recoverable, unlike previously):

kernel: PM: suspend entry (s2idle)
kernel: Filesystems sync: 0.029 seconds
kernel: Freezing user space processes
kernel: Freezing user space processes completed (elapsed 0.004 seconds)
kernel: OOM killer disabled.
kernel: Freezing remaining freezable tasks failed after 20.009 seconds (0 tasks refusing to freeze, wq_busy=1):
kernel: Showing freezable workqueues that are still busy:
kernel: workqueue events_freezable: flags=0x4
kernel:   pwq 10: cpus=2 node=0 flags=0x0 nice=0 active=0 refcnt=2
kernel:     inactive: pci_pme_list_scan
kernel: workqueue events_freezable_pwr_efficient: flags=0x86
kernel:   pwq 64: cpus=0-15 flags=0x4 nice=0 active=0 refcnt=3
kernel:     inactive: 2*disk_events_workfn
kernel:   pwq 64: cpus=0-15 flags=0x4 nice=0 active=2 refcnt=3
kernel:     in-flight: 176436:disk_events_workfn ,176450:disk_events_workfn
kernel: workqueue pm: flags=0x4
kernel:   pwq 34: cpus=8 node=0 flags=0x0 nice=0 active=0 refcnt=2
kernel:     inactive: pm_runtime_work
kernel:   pwq 50: cpus=12 node=0 flags=0x0 nice=0 active=0 refcnt=2
kernel:     inactive: pm_runtime_work

I know, it says s2idle, but laptop is actually going into deep suspend-to-ram state, probably its doing that through some fancy s0ix state.

Rok Mandeljc - 2024-08-18

That does look more in line with other reports.

To clarify - you do not only have vhba module loaded, but also have at least one cdemu-daemon instance running, and at least one virtual device created? Or not (i.e, just vhba module loaded, and nothing else).

Also, do you have any other optical drives (internal or external) connected to the system?

The number of in-flight number reported fordisk_events_workfn seems suspiciously high. I think this is related to kernel's continuous probing for device status and events (such as media change). And now that I think about it some more, during suspend/hibernate, user processes are frozen first. So cdemu-daemon cannot receive and process any new or outstanding commands that might be still queued in the kernel (for example, as result of that events probing task). Could be that in this scenario, either the vbha module itself or the probing task goes haywrite... I suppose once I find some more time, I'll try to simulate/induce a command processing freeze (while keeping control device file handle open) in the daemon to see what happens on the kernel side.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Grundik - 2024-08-18

VHBA module, and cdemu kde daemon. 2 virtual drives, but no images loaded in them. No extra optical drives in system.

I'll disable daemon, and report back in whats the result.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Grundik - 2024-08-18

P.S. clarification: systemd cdemu daemon and cdemu kde manager. I'll disable both of them.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Grundik - 2024-08-27

More than week have passed, vhba module is loaded, cdemu daemon and manager are not (but was loaded once, to talk to the module initially) — not a single hiccup occured.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Grundik - 2024-08-27

I've noticed, that probability of failure increased, if memory-heavy applications were loaded, and swap space intensively used, even if they are already unloaded at the time of suspend. Maybe something critical could get swapped out?..

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Suspend freezes PC with kernel 6.10

Milestone

Searches

Help

#139 Suspend freezes PC with kernel 6.10

Discussion