System hard-lockups after eGPU disconnect
Affected version
Arch Linux.
The issue happens on the following versions:
- gnome-shell 47.1-1
- gdm 47.0-1
- mesa 24.2.6-1
- kernel 6.6.60-1-lts
- linux firmware 20241017.22a6c7dc-1
Issue occurs on Wayland. At the time of writing, I did not try replicating on Xorg, but will if it would help.
Bug summary
After the eGPU (with no display cable attached) is connected, Gnome Shell starts using 1MiB of VRAM (same as others stated in #2969). The system works as intended on this phase.
The issue arises when disconnecting the eGPU. The system becomes unstable after a few minutes or a sleep/wake cycle. 90% of times, the system hard-lockups, and 10% just Gnome locks up, which can be fixed using SSH and a kill
command.
The system in cause is a Framework laptop 13, with:
- iGPU: an Intel Alder Lake-P GT2 [Iris Xe Graphics]
- eGPU: an AMD Radeon RX 7900 GRE
- enclosure: ADT-Link UT3G, connected via Thunderbolt 4
Additionally, the amdgpu
module is still loaded after the eGPU is disconnected, which I expected to not be the case. When I try to use # rmmod amdgpu
, I get that the device is busy, even though nothing is using it. This issue does not happen on other DEs I tried, in case it helps.
I posted this exact issue on an AMD project a while ago, which includes some debugging steps I performed previously: https://github.com/ROCm/ROCm/issues/3866
Steps to reproduce
- Boot linux-lts
- Start Gnome
- Connect the enclosure with the GPU
- See nvtop,
/usr/bin/gnome-shell
appears - Optionally
- Start applications that use the eGPU (Gnome Console, Bottles)
- Close the applications
- Disconnect the enclosure
- Use the system for a while (reopenning Gnome Console sometimes triggers the lockup)
What happened
- The system most of the times hard-lockups, does not respond to anything but a hard reboot
- Other times, just Gnome locks up, which can be solved using SSH and a
kill
command -
amdgpu
remains loaded
What did you expect to happen
- The system continues working as expected
-
amdgpu
is unloaded
Relevant logs, screenshots, screencasts etc.
I can provide the following files on this issue:
Screenshot of nvtop
, when no graphical application is running on the eGPU:
I can also provide any extra details that can help on this issue. Thank you for the help!