main: Don't treat si_pid from pidfd as child exiting

Merged Jonas Ådahl requested to merge jadahl/glib:wip/pidfd-exit-status into main

We might repeatedly get si_pid == 0 for a child that hasn't exited, meaning we won't get a correct exit status. This seems to happen when the glib application tracks a ptrace():ed child process; the correct exit status of the process using e.g. a BPF program, where one can observe that glib appears to get it wrong.

I've tried to write a test case for this, but the reproducer I have is somewhat hard to translate to a test case. This is what seems to happen:

  1. Run mutter inside catch (a small utility that ptrace()es all child processes (and subprocesses) and generates backtraces for every SIGABRT and SIGSEGV.
  2. Make something spawn Xwayland
  3. kill -SIGKILL that Xwayland
  4. Observe that mutter gets the wrong result from g_subprocess_get_success() (it returns TRUE).

For example:

dbus-run-session -- catch mutter --nested weston-terminal

In weston-terminal run xterm

In another terminal, use ps ax | grep Xwayland to find the correct Xwayland process ID, and run kill -SIGKILL <PID>.

In the terminal where mutter was run, one should see X Wayland crashed; attempting to recover, but here 2 out of 3 times, it won't, meaning g_subprocess_get_success() returned TRUE.

Another way to observe it is to run the exitsnoop BPF program (, and see that Xwayland will have a error exit status.

I tried to write a test case that:

  1. Spawns a GSubprocess
  2. fork()
  3. Run the sleep(2); kill(SIGKILL, subprocess_pid); exit(0); in the fork
  4. And ptrace()/waitpid() etc imitate catch

The problem with this is that waitpid() "consumes" the exit status meaning waitid() in g_child_watch_check() fails instead of succeeds while setting info.si_pid to 0.

Also can't really find any documentation about what si_pid being 0 should mean, nor whether POLLIN with an "empty" info really meaning the process exited or not.

Edited by Jonas Ådahl

Merge request reports