gnome-shell moves applications between cgroups while they are running
While testing the Ubuntu 23.04 development releases we've noticed intermittent failures to launch snap applications from gnome-shell, as documented in LP bug #2011806.
The snap run
launcher that sets up the sandbox for the application moves itself to a new cgroup via StartTransientUnit
in order to restrict device access for the sandboxed app. In parallel to this, gnome-shell issues its own StartTransientUnit
call (implemented in !863 (merged)) at some unpredictable point in time. If it happens at the wrong point, it causes some sanity check assertions to fail and the app launch is aborted. As far as I can tell, this race has existed for a while: I'm not yet sure why we started triggering it so often. Maybe we just got unlucky.
I had looked into whether there was anything we could do to solve this from the snap side, but there isn't really any way to synchronise with what gnome-shell is doing. I think a better approach would be for gnome-shell to make sure its cgroup transition completes before the application starts running. That would be after the fork
but before the exec
system call. And like systemd-run --scope
, it should wait for the cgroup transition to complete by watching for a JobRemoved
signal matching the StartTransientUnit
call.
I can see two ways this could be accomplished:
- have the
GSpawnChildSetupFunc
perform the cgroup transition on itself in a blocking fashion. This is complicated by the fact that none of the open D-Bus connections are in a usable state in this environment. - have the
GSpawnChildSetupFunc
block until the parent completes the cgroup transition. This requires some way for the parent and child to synchronise.
One idea I had for implementing (2) was for the child setup func to do a kill(getpid(), SIGSTOP)
, and have the parent send a SIGCONT
when it is done. That still runs the risk of deadlock if the parent finishes its work before the child executes its SIGSTOP
.