main: Don't treat si_pid from pidfd as child exiting (!3433) · Merge requests · GNOME / GLib

Jonas Ådahl requested to merge jadahl/glib:wip/pidfd-exit-status into main May 15, 2023

We might repeatedly get si_pid == 0 for a child that hasn't exited, meaning we won't get a correct exit status. This seems to happen when the glib application tracks a ptrace():ed child process; the correct exit status of the process using e.g. a BPF program, where one can observe that glib appears to get it wrong.

I've tried to write a test case for this, but the reproducer I have is somewhat hard to translate to a test case. This is what seems to happen:

Run mutter inside catch (a small utility that ptrace()es all child processes (and subprocesses) and generates backtraces for every SIGABRT and SIGSEGV.
Make something spawn Xwayland
kill -SIGKILL that Xwayland
Observe that mutter gets the wrong result from g_subprocess_get_success() (it returns TRUE).

For example:

dbus-run-session -- catch mutter --nested weston-terminal

In weston-terminal run xterm

In another terminal, use ps ax | grep Xwayland to find the correct Xwayland process ID, and run kill -SIGKILL <PID>.

In the terminal where mutter was run, one should see X Wayland crashed; attempting to recover, but here 2 out of 3 times, it won't, meaning g_subprocess_get_success() returned TRUE.

Another way to observe it is to run the exitsnoop BPF program (https://github.com/iovisor/bcc/blob/master/tools/exitsnoop.py), and see that Xwayland will have a error exit status.

I tried to write a test case that:

Spawns a GSubprocess
fork()
Run the sleep(2); kill(SIGKILL, subprocess_pid); exit(0); in the fork
And ptrace()/waitpid() etc imitate catch

The problem with this is that waitpid() "consumes" the exit status meaning waitid() in g_child_watch_check() fails instead of succeeds while setting info.si_pid to 0.

Also can't really find any documentation about what si_pid being 0 should mean, nor whether POLLIN with an "empty" info really meaning the process exited or not.

Edited May 15, 2023 by Jonas Ådahl

main: Don't treat si_pid from pidfd as child exiting

Merge request reports