gnome-shell segmentation fault on resume from suspend with eGPU as primary
Affected version
Fedora 37, with gnome-shell and mutter 43.1
Bug appears with Wayland using a rx480 eGPU over thunderbolt
Specifically, I have a laptop with integrated and (weak) dedicated graphics, which are disabled, and an eGPU that I have set as primary.
Bug summary
Segfault occurs during resume from suspend. Several strange things occurred:
- Laptop internal display was disabled, only external monitor was used, yet on resume the internal display turned on briefly and displayed the log in screen. (Then went black and returned to GDM)
- External monitor flashed briefly after the internal display went black, displaying the same data as the internal monitor.
- Usually eGPU related crashes require a hard reboot, but this one did not.
Steps to reproduce
Unknown, issue cannot be reproduced easily.
What happened
gnome-shell segfaulted, causing session to be terminated and returning computer to the login screen.
What did you expect to happen
computer cleanly resumes from sleep
Relevant logs, screenshots, screencasts etc.
The crash left a core dump, and the following line in the journal (and dmesg)
Dec 08 18:18:33 valinor kernel: gnome-shell[2533]: segfault at 10 ip 00007faf4c188374 sp 00007ffcac70b150 error 4 in libmutter-11.so.0.0.0[7faf4c04e000+159000]
Looking at the journal bits that mention amdgpu, there is a large gap corresponding to when the computer was asleep. The previous line was logged shortly before this
Dec 08 18:18:36 valinor gnome-shell[28883]: Added device '/dev/dri/card1' (amdgpu) using atomic mode setting.
There was also a core dump generated. I installed the debug packages and ran it through gdb
(gdb) bt full
#0 0x00007faf4c188374 in meta_egl_destroy_surface (error=0x0, egl=<optimized out>, surface=0x559db34d9ae0, display=<optimized out>) at ../src/backends/meta-egl.c:454
renderer_gpu_data = 0x559dadde1c60
render_device = 0x30
#1 secondary_gpu_state_free (secondary_gpu_state=0x559daf3ef370) at ../src/backends/native/meta-onscreen-native.c:565
renderer_gpu_data = 0x559dadde1c60
render_device = 0x30
#2 0x00007faf4d5168d4 in g_object_unref (_object=<optimized out>) at ../gobject/gobject.c:3867
weak_locations = <optimized out>
nqueue = 0x559db1efc600
old_ref = <optimized out>
object = 0x559dad64c440
__func__ = "g_object_unref"
#3 g_object_unref (_object=0x559dad64c440) at ../gobject/gobject.c:3784
object = 0x559dad64c440
__func__ = "g_object_unref"
#4 0x00007faf4c42ef88 in clutter_stage_view_finalize (object=0x559dacfeb5f0) at ../clutter/clutter/clutter-stage-view.c:1477
_pp = <optimized out>
_ptr = <optimized out>
view = <optimized out>
priv = <optimized out>
#5 0x00007faf4d5169b2 in g_object_unref (_object=<optimized out>) at ../gobject/gobject.c:3909
weak_locations = <optimized out>
nqueue = 0x559daec24ea0
old_ref = <optimized out>
object = 0x559dacfeb5f0
__func__ = "g_object_unref"
#6 g_object_unref (_object=0x559dacfeb5f0) at ../gobject/gobject.c:3784
object = 0x559dacfeb5f0
__func__ = "g_object_unref"
#7 0x00007faf4c1794ac in meta_kms_page_flip_closure_free (closure=0x559db2d71bb0) at ../src/backends/native/meta-kms-page-flip.c:77
_pp = 0x559db2d71bc0
_ptr = <optimized out>
#8 0x00007faf4cf29020 in g_list_foreach (list=<optimized out>, list@entry=0x559daf5cc0c0 = {...}, func=0x7faf4c179490 <meta_kms_page_flip_closure_free>, user_data=user_data@entry=0x0) at ../glib/glist.c:1092
next = 0x0
#9 0x00007faf4cf3384f in g_list_free_full (list=0x559daf5cc0c0 = {...}, free_func=<optimized out>) at ../glib/glist.c:246
#10 0x00007faf4c17a678 in meta_kms_page_flip_data_unref (page_flip_data=0x559daf403b90) at ../src/backends/native/meta-kms-page-flip.c:110
#11 meta_kms_page_flip_data_unref (page_flip_data=0x559daf403b90) at ../src/backends/native/meta-kms-page-flip.c:106
#12 0x00007faf4c19e2dc in meta_kms_callback_data_free (callback_data=0x559daf361720) at ../src/backends/native/meta-kms.c:365
callback_data = 0x559daf361720
l = 0x559db0c8b6e0 = {0x559daf361720, 0x559db07c9db0}
callback_count = <optimized out>
#13 flush_callbacks.isra.0 (kms=kms@entry=0x559dacdd6e70) at ../src/backends/native/meta-kms.c:384
callback_data = 0x559daf361720
l = 0x559db0c8b6e0 = {0x559daf361720, 0x559db07c9db0}
callback_count = <optimized out>
#14 0x00007faf4c18489d in callback_idle (user_data=user_data@entry=0x559dacdd6e70) at ../src/backends/native/meta-kms.c:399
kms = 0x559dacdd6e70
#15 0x00007faf4cf37cb2 in g_idle_dispatch (source=0x559db118a5b0, callback=0x7faf4c184890 <callback_idle>, user_data=0x559dacdd6e70) at ../glib/gmain.c:6124
idle_source = 0x559db118a5b0
again = <optimized out>
#16 0x00007faf4cf38cbf in g_main_dispatch (context=0x559dacdcf9a0) at ../glib/gmain.c:3444
dispatch = 0x7faf4cf37c90 <g_idle_dispatch>
prev_source = 0x0
begin_time_nsec = 16186996180470
was_in_call = 0
user_data = 0x559dacdd6e70
callback = 0x7faf4c184890 <callback_idle>
cb_funcs = 0x7faf4d0203e0 <g_source_callback_funcs>
cb_data = 0x559dad982880
need_destroy = <optimized out>
source = 0x559db118a5b0
current = 0x559dacdac900
i = 21
#17 g_main_context_dispatch (context=0x559dacdcf9a0) at ../glib/gmain.c:4162
#18 0x00007faf4cf8e598 in g_main_context_iterate.constprop.0 (context=0x559dacdcf9a0, block=1, dispatch=1, self=<optimized out>) at ../glib/gmain.c:4238
max_priority = 200
timeout = 0
some_ready = 1
nfds = 15
--Type <RET> for more, q to quit, c to continue without paging--c
allocated_nfds = <optimized out>
fds = <optimized out>
begin_time_nsec = 16186996063648
#19 0x00007faf4cf3828f in g_main_loop_run (loop=0x559daf4613e0) at ../glib/gmain.c:4438
__func__ = "g_main_loop_run"
#20 0x00007faf4c0d0f69 in meta_context_run_main_loop (context=<optimized out>, error=0x7ffcac70b450) at ../src/core/meta-context.c:453
priv = 0x559dacdca050
__func__ = "meta_context_run_main_loop"
#21 0x0000559daca1fe09 in main ()
~~Note that gdb complains that there is a missing debuginfo package for gnome-shell even though it is already installed.~~
Looking closely, I can't seem to find the debuginfo package for gnome-shell. Even the package suggested by gdb can't be found. I suspect this is a fedora problem, so I'm going to ask there and try to get a full backtrace.