EGL wayland surfaces are freed too early (?)
Submitted by memeka
Link to original bug (#780681)
Description
GTK+ EGL applications such as totem or gnome-maps on wayland segphault on exit because they try to use surfaces that have been already freed. The issue seems to be in GDK, because in gnome, they crash the entire session (gnome-shell also crashes), but in weston only the application throws segpfault when exiting. I am assuming this is because weston does not use GTK+ but gnome-shell does.
This is an example trace from totem:
Core was generated by `totem bbb_720p.mov'. Program terminated with signal SIGSEGV, Segmentation fault.
#0 get_next_argument (signature=0x2c <error: Cannot access memory at address 0x2c>, details=details@entry=0xbee39a9c) at ../src/connection.c:430
430 for(; *signature; ++signature) {
[Current thread is 1 (Thread 0xb213cd70 (LWP 12827))]
(gdb) bt
#0 get_next_argument (signature=0x2c <error: Cannot access memory at address 0x2c>, details=details@entry=0xbee39a9c) at ../src/connection.c:430
#1 0xb4ce69ba in wl_argument_from_va_list (signature=<optimized out>, args=args@entry=0xbee39acc, count=count@entry=20, ap=..., ap@entry=...) at ../src/connection.c:493
#2 0xb4ce5598 in wl_proxy_marshal (proxy=0x7f6bedb0, opcode=1) at ../src/wayland-client.c:692
#3 0xb4f8685e in window_surface_delete () from /usr/lib/arm-linux-gnueabihf/egl-current/libwayland-egl.so.1
#4 0xb4f7e1e4 in eglp_window_surface_specific_deinitialization () from /usr/lib/arm-linux-gnueabihf/egl-current/libwayland-egl.so.1
#5 0xb4f7cd14 in eglp_delete_surface () from /usr/lib/arm-linux-gnueabihf/egl-current/libwayland-egl.so.1
#6 0xb4f7ce74 in eglp_destroy_all_non_current_surfaces () from /usr/lib/arm-linux-gnueabihf/egl-current/libwayland-egl.so.1
#7 0xb4f7a71a in eglp_try_display_finish_terminating () from /usr/lib/arm-linux-gnueabihf/egl-current/libwayland-egl.so.1
#8 0xb4f7b1e2 in eglTerminate () from /usr/lib/arm-linux-gnueabihf/egl-current/libwayland-egl.so.1
#9 0xb4f7b22c in eglp_unload_callback () from /usr/lib/arm-linux-gnueabihf/egl-current/libwayland-egl.so.1
#10 0xb4decc24 in osup_term_unload_hooks () from /usr/lib/arm-linux-gnueabihf/egl-current/libwayland-egl.so.1
#11 0xb4dde4ca in osup_c_unload_hook () from /usr/lib/arm-linux-gnueabihf/egl-current/libwayland-egl.so.1
#12 0xb6fd3f42 in ?? () from /lib/ld-linux-armhf.so.3
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) print (struct wl_proxy) *0x7f6bedb0
$3 = {object = {interface = 0x7fe1bfc8, implementation = 0x7fb51c30, id = 44}, display = 0x7f660ec0, queue = 0x7f660f2c, flags = 2, refcount = 1, user_data = 0x0, dispatcher = 0x0, version = 3}
(gdb) print (struct wl_interface) *0x7fe1bfc8 # => this is proxy->interface - you can see the name is garbage already
$4 = {name = 0xa93e931d "iXh\377\367Һ\022KP!0\265{D\021L\205\260\025F\034Y#h\003\223\377\367\f\354\016IjF", version = 49, method_count = -2147421248, methods = 0x7f6beda8, event_count = 0, events = 0x0}
(gdb) print (struct wl_message) *0x7f6beda8 # => this is proxy->interface->methods => you can see the signature field cannot be accessed (0x31 is invalid) leading to the segmentation fault
$5 = {name = 0x0, signature = 0x31 <error: Cannot access memory at address 0x31>, types = 0x7fe1bfc8}
This is running gtk+ 3.22.8 (debian stretch) on armhf architecture with Mali T628 GPU using the ARM wayland drivers version r12p0. All files in the egl-current directory (including libwayland-egl.so) are symlinks to the binary mali driver libmali.so
I've raised the issue first with ARM (see https://community.arm.com/graphics/f/discussions/8146/r12p0-wayland-driver-odroid-xu3-frees-objects-too-early-leading-to-segm-fault) and after investigation I was told by an ARM engineer that the issue probably is in GDK:
This segfault can happen if the application frees the Wayland surface too early, specifically if the associated EGL surface is still current. If this is the case, the application is doing something like the following during clean up:eglDestroySurface(egl_surface); wl_egl_window_destroy(wl_egl_window_win); wl_surface_destroy(wl_surface);
If egl_surface was either the draw or read argument in the previous call to eglMakeCurrent, egl_surface and wl_egl_window_win are only marked for deletion and are still in use. Destroying wl_surface results in the SEGFAULT when the driver subsequently needs to do something with the wl_surface (in this case, part of deletion). EGL spec 1.5 sections 3.5.5 and 3.2 cover the lifetime of EGL objects.
There are 2 possible application fixes you could consider:
- Call eglMakeCurrent(display, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT) before destroying the surface.
- Call eglTerminate() instead of destroying the surfaces individually.
I'm reasonably confident that this is an issue in GDK (or how totem is calling GTK+) rather than the driver.
Version: 3.22.x