Major UI process memory corruption
Sometime in the past couple months, we've developed a UI process memory corruption problem. I'm seeing frequent UI process crashes with backtraces in GTK CSS code or in memory allocation routines, both sure signs of memory corruption. To be blunt, I think 3.34 is going to be a crashy, bad release.
Unfortunately, this is almost impossible to fix.
- We cannot bisect it, because although the crashes occur relatively frequently, they're still only occasional, non-reproducible occurrences.
- Backtraces taken with gdb are useless because they point to innocent code that was stomped on by the memory corruption elsewhere. The only way to fix memory corruption is with valgrind or asan.
- When I try to run Tech Preview under valgrind, it just crashes.
- When I tried to build Tech Preview with asan enabled, I couldn't figure out how to make the web process not crash on startup.
It's an extremely discouraging problem and I don't see much to do other than to give up.
To reproduce the valgrind crash:
$ flatpak run -d --command=/bin/bash org.gnome.Epiphany.Devel
[📦 org.gnome.Epiphany.Devel ~]$ valgrind epiphany
You'll hit:
valgrind: ../../coregrind/m_debuginfo/image.c:517 (realloc_CEnt): Assertion 'szB >= CACHE_ENTRY_SIZE' failed.
host stacktrace:
==7== at 0x5804665A: show_sched_status_wrk (m_libcassert.c:369)
==7== by 0x58046787: report_and_quit (m_libcassert.c:440)
==7== by 0x5804692B: vgPlain_assert_fail (m_libcassert.c:506)
==7== by 0x580CF4E0: realloc_CEnt (image.c:517)
==7== by 0x580CF4E0: get_slowcase (image.c:773)
==7== by 0x580D04E7: get (image.c:816)
==7== by 0x580D04E7: vgModuleLocal_img_get (image.c:1088)
==7== by 0x580D04E7: vgModuleLocal_img_get_UInt (image.c:1181)
==7== by 0x580D5CDA: UnknownInlinedFun (priv_image.h:326)
==7== by 0x580D5CDA: vgModuleLocal_read_callframe_info_dwarf3 (readdwarf.c:3708)
==7== by 0x5808B366: vgModuleLocal_read_elf_debug_info (readelf.c:3202)
==7== by 0x58078FCE: di_notify_ACHIEVE_ACCEPT_STATE (debuginfo.c:971)
==7== by 0x58078FCE: vgPlain_di_notify_mmap (debuginfo.c:1321)
==7== by 0x580AC08C: vgModuleLocal_generic_PRE_sys_mmap (syswrap-generic.c:2402)
==7== by 0x580B7E63: vgSysWrap_amd64_linux_sys_mmap_before (syswrap-amd64-linux.c:404)
==7== by 0x580A73E5: vgPlain_client_syscall (syswrap-main.c:1863)
==7== by 0x580A382A: handle_syscall (scheduler.c:1176)
==7== by 0x580A5029: vgPlain_scheduler (scheduler.c:1498)
==7== by 0x580F4754: thread_wrapper (syswrap-linux.c:103)
==7== by 0x580F4754: run_a_thread_NORETURN (syswrap-linux.c:156)
To test asan, you could revert 01ad93ac and 71109615 to make the web process crash.
Edited by Michael Catanzaro