gthread USE_NATIVE_MUTEX is completely wrong (racy) and should be removed
Today a user reported a problem with g_cond_wait_until
on musl 1.2.x which turned out to be due to raw use of the futex syscall with mismatched time types. But looking into how to correct this, I ended up reading the USE_NATIVE_MUTEX
code paths in gthread-posix.c
and they're just completely wrong. Here is the cond var wait implementation:
It does not use the mutex at all; it just unlocks it while performing a futex wait. This defeats the whole purpose of having the condvar bound to a mutex, which is that the mutex unlock is atomic with (or, implementation-wise, happens-after) joining the condvar wait queue, so that signals cannot be missed. And as a result, the obvious race exists. If, between lines 1574 and 1575, another thread acquires the mutex, makes changes to the mutex-protected state, and signals, the signal will be missed and the waiting thread will hang until there is another signal.
By and large, the USE_NATIVE_MUTEX
is just wrong and would have to be completely rewritten with a real condvar in order to be workable. Until that happens, it should just be disabled or removed and use the underlying POSIX threads API which takes responsibility for actually implementing condvar semantics.