We might repeatedly get si_pid == 0 for a child that hasn't exited, meaning we won't get a correct exit status. This seems to happen when the glib application tracks a ptrace():ed child process; the correct exit status of the process using e.g. a BPF program, where one can observe that glib appears to get it wrong.
I've tried to write a test case for this, but the reproducer I have is somewhat hard to translate to a test case. This is what seems to happen:
mutterinside catch (a small utility that ptrace()es all child processes (and subprocesses) and generates backtraces for every
- Make something spawn Xwayland
kill -SIGKILLthat Xwayland
- Observe that mutter gets the wrong result from
dbus-run-session -- catch mutter --nested weston-terminal
In weston-terminal run
In another terminal, use
ps ax | grep Xwayland to find the correct Xwayland process ID, and run
kill -SIGKILL <PID>.
In the terminal where mutter was run, one should see
X Wayland crashed; attempting to recover, but here 2 out of 3 times, it won't, meaning
Another way to observe it is to run the
exitsnoop BPF program (https://github.com/iovisor/bcc/blob/master/tools/exitsnoop.py), and see that
Xwayland will have a error exit status.
I tried to write a test case that:
- Spawns a
- Run the
sleep(2); kill(SIGKILL, subprocess_pid); exit(0);in the fork
The problem with this is that
waitpid() "consumes" the exit status meaning
g_child_watch_check() fails instead of succeeds while setting
Also can't really find any documentation about what
0 should mean, nor whether
POLLIN with an "empty"
info really meaning the process exited or not.