Cancelling g_file_replace_contents_async() may leave target file empty, or leave temporary file litter
I have an application which, in response to state change, writes out a file using g_file_replace_contents_bytes_async(), without waiting for the result. On shutdown, it cancels a GCancellable which was passed to all previous calls, makes one final call to g_file_replace_contents_bytes_async(), and exits when that call completes.
In testing, I have found that even though the final call completes successfully, the target file is occasionally empty, even though it existed and was non-empty before the application started, and none of the calls to overwrite it should make it zero-length. In even rarer cases, the target file has the expected contents, but the parent directory contains a .goutputstream-XXXXXX
temporary file.
I reproduced these in a small program which does the following:
- Creates a temporary directory
- Creates a non-empty file within that directory, to rule out special-cases like those described in #761 (for this API) and #1302 (closed) (for g_file_set_contents()) that skip an fsync() when the target file doesn't exist or is empty
- Calls g_file_replace_contents_bytes_async() 100 times, passing the same cancellable each time, with data "0" through "99" (all non-empty strings of length 1 or 2)
- Cancels the cancellable
- Calls g_file_replace_contents_async() one last time with distinctive content
- Waits for all 101 calls to complete, asserting that the last one succeeds and that the other 100 either succeed or fail with CANCELLED
It then performs some checks:
- the file is non-empty
- the file contains the final, distinctive content
- the file can be unlinked
- the target directory can be removed (implicitly testing that it is empty)
- if the target directory can't be removed, checks that this isn't because the target file has reappeared
In practice, the first check fails relatively regularly, and the fourth fails occasionally. I've attached a wrapper script which runs it in a loop until both failures are seen.
The temporary-file litter (4th check) is just a bit annoying. The empty-file case seems very bad: at no point did we write an empty file.
I was about to claim that, if I wait for the 100 cancelled writes to finish before starting the final write, all checks pass, but that's not true, it just takes more iterations to catch it failing. The 5th check also very occasionally fails: the target file gets recreated with empty contents even though I waited for all operations to finish!
I've attached the version that waits before starting the final write. Here's the output, lightly edited:
$ time ./replace_contents.py
** (process:7266): WARNING **: 12:00:47.870: /tmp/replace_contents.XM5GQZ Directory not empty
** (process:11340): WARNING **: 12:00:52.706: /tmp/replace_contents.KJ4PQZ Directory not empty
** (process:13160): WARNING **: 12:00:54.960: empty file
** (process:17641): WARNING **: 12:01:42.002: /tmp/replace_contents.YFYJQZ Directory not empty
[16 "Directory not empty"s removed]
** (process:32491): WARNING **: 12:08:14.490: /tmp/replace_contents.BMD8PZ Directory not empty
** (process:7328): WARNING **: 12:10:22.315: empty file
** (process:10951): WARNING **: 12:10:26.608: /tmp/replace_contents.3UOMQZ Directory not empty
** (process:10951): WARNING **: 12:10:26.609: /tmp/replace_contents.3UOMQZ/target reappeared. contents 0 bytes: ''
took 34572 iterations
real 9m47.555s
user 3m10.619s
sys 3m10.818s
So, it misbehaves on 24 out of 34,572 iterations…