Skip to content

tests: Don't assume that sh optimizes simple commands into exec

Simon McVittie requested to merge wip/smcv/issue3157 into main

Depending on the operating system, /bin/sh might either be bash (for example on Fedora or Arch) or dash (for example on Debian or Ubuntu) or some other POSIX shell.

When bash is asked to run a simple command with no shell keywords or metacharacters, like this one, it replaces itself with the program via execve(), but dash does not have that optimization and treats it like any other program invocation in a larger script: it will fork, exec the program in the child, and wait for the child in the parent.

This seems like it conflicts with sleep_and_kill() assuming that it can use the subprocess's process ID as the sleep(1) process ID. Specifically, if it sends SIGKILL, it will go to the sh(1) process and not the sleep(1) child, which could result in the sh(1) process being terminated and its sleep(1) child being leaked.

To get the bash-like behaviour portably, explicitly use the exec builtin to instruct the shell to replace itself with sleep(1), so that the process ID previously used for the shell becomes the process ID of the sleep process.

This appears to resolve an intermittent hang and test timeout on Debian machines (especially slower ones), although I'm not 100% clear on the mechanics of how it happens.

Resolves: #3157 (closed)


/cc @jadahl @pwithnall

If this is successful, I'd appreciate it if it could be included in 2.78.2.

Merge request reports