Skip to content

gtestdbus: Fix watcher crash on FreeBSD

Ting-Wei Lan requested to merge wip/lantw/fix-gtestdbus-crash-on-freebsd into master

In file gio/gtestdbus.c, function watch_parent, there is a loop which waits for commands sent from the parent process and kills all processes recorded in 'pids_to_kill' array on parent process exit. The detection of parent process exit is done by calling g_poll and checking whether the returned event is G_IO_HUP. However, 'revents' is a bit mask, and we should use a bitwise-AND check instead of the equality check here.

It seems to work fine on Linux, but it fails on FreeBSD because the g_poll returns both G_IO_IN and G_IO_HUP on pipe close. This means the watcher process continues waiting for commands after the parent process exit, and g_io_channel_read_line returns G_IO_STATUS_EOF with 'command' set to NULL. Then the watcher process crashes with segfault when calling sscanf because 'command' is NULL. Since the test result is already reported by the parent process as 'OK', this kind of crash is likely to be unnoticed unless someone checks dmesg messages after the test:

pid 57611 (defaultvalue), uid 1001: exited on signal 11
pid 57935 (actions), uid 1001: exited on signal 11
pid 57945 (gdbus-bz627724), uid 1001: exited on signal 11
pid 57952 (gdbus-connection), uid 1001: exited on signal 11
pid 57970 (gdbus-connection-lo), uid 1001: exited on signal 11
pid 57976 (gdbus-connection-sl), uid 1001: exited on signal 11
pid 58039 (gdbus-exit-on-close), uid 1001: exited on signal 11
pid 58043 (gdbus-exit-on-close), uid 1001: exited on signal 11
pid 58047 (gdbus-exit-on-close), uid 1001: exited on signal 11
pid 58051 (gdbus-exit-on-close), uid 1001: exited on signal 11
pid 58055 (gdbus-export), uid 1001: exited on signal 11
pid 58059 (gdbus-introspection), uid 1001: exited on signal 11
pid 58065 (gdbus-names), uid 1001: exited on signal 11
pid 58071 (gdbus-proxy), uid 1001: exited on signal 11
pid 58079 (gdbus-proxy-threads), uid 1001: exited on signal 11
pid 58083 (gdbus-proxy-well-kn), uid 1001: exited on signal 11
pid 58091 (gdbus-test-codegen), uid 1001: exited on signal 11
pid 58095 (gdbus-threading), uid 1001: exited on signal 11
pid 58104 (gmenumodel), uid 1001: exited on signal 11
pid 58108 (gnotification), uid 1001: exited on signal 11
pid 58112 (gdbus-test-codegen-), uid 1001: exited on signal 11
pid 58116 (gapplication), uid 1001: exited on signal 11
pid 58132 (dbus-appinfo), uid 1001: exited on signal 11

If the watcher process crashes before killing the dbus-daemon process spawned by the parent process, the dbus-daemon process will keep running after all tests complete. Due to the implementation of 'communicate' function in Python subprocess, it causes meson to crash. 'communicate' assumes the stdout and stderr pipes are closed when the child process exits, but it is not true if processes forked by the child process doesn't exit. It causes Python subprocess 'communicate' function to block on the call to poll until the timeout expires even if the test finishes in a few seconds. Meson assumes the timeout exception always means the test is still running. It calls 'communicate' again and crashes because pipes no longer exist.

References:

Unfortunately, we are still not ready to enable FreeBSD CI runner. GLib depends on meson 0.47, but meson 0.47 introduces library path handling bug as I described in https://github.com/mesonbuild/meson/pull/3463#issuecomment-399727814. I haven't spend enough time debugging this issue, and tests on the runner fail because of wrong RPATH.

Merge request reports