crash/freeze on GPUs related to preempt in KMS GPU driver/module
Currently, the released GNOME 45.1 as shipped in Ubuntu 23.10, Fedora 39 and other distros has broken out-of-the-box behavior for certain GPUs. This leads to a bad user experience where the shell fully freezes and crashes. Resulting in loss of work, frustration and avoidance of GNOME itself.
Not sure if this affects only AMD GPUs or only a certain set of AMD GPUs or Intel+AMD or all GPUs, but at least for some AMD and some Intel multiple people are reporting to experience sudden crashes during some certain kinds of interactions in the Shell.
Also, unsure if it is related to the experimental RT scheduler in Mutter or not. As I had it enabled when the crashes/freezes happened frequently, I now run it with it disabled just to be sure.
Creating this issue as a collection of other issues to coordinate and track the user facing issue: GNOME 45.1 crashing, disrupting work. Please let me know if this should be somewhere else or if there is already an issue somewhere to track this.
I can get the Shell on my AMD Polaris RX 550 to crash reliably with experimental RT scheduler enabled, running on kernel 6.6-rc7. The way I can trigger this in a few seconds is with an N-key rollover keyboard smash spamming the switch workspace shortcut back and forth (Ctrl + Alt + < or >), then opening the overview while an overview transition animation is still going. It also sometimes happened while Alt-Tabbing or switching overviews, and while switching to full-screen on a video/image in either Telegram or Firefox.
Temporary workaround for AMD GPUs that seems to work for my AMD Polaris RX 550 card:
Thanks to the comment to disable AMDGPU kernel module's Mid-command buffer preemption (MCBP) feature from @daenzer in !3037 (merged) I now have a stable GNOME experience once again, haven't experienced any crashes thus far since disabling MCBP yesterday (2023-11-09).
To configure this workaround @daenzer recommended to pass amdgpu.mcbp=0
as a command line argument to the kernel via the bootloader.
To be completely sure I also configured a drop-in modprobe config on the filesystem:
/etc/modprobe.d/amdgpu-no-mcbp.conf
options amdgpu mcbp=0
Possibly related to:
https://bugs.launchpad.net/ubuntu/+source/mutter/+bug/2034619
https://bugs.launchpad.net/ubuntu/+source/gnome-shell/+bug/2039045