Skip to content

gthread: Use C11-style memory consistency to speed up g_once()

The g_once() function exists to call a callback function exactly once, and to block multiple contending threads on its completion, then to return its return value to all of them (so they all see the same value).

The full implementation of g_once() (in g_once_impl()) uses a mutex and condition variable to achieve this, and is needed in the contended case, where multiple threads need to be blocked on completion of the callback.

However, most of the times that g_once() is called, the callback will already have been called, and it just needs to establish that it has been called and to return the stored return value.

Previously, a fast path was used if we knew that memory barriers were not needed on the current architecture to safely access two dependent global variables in the presence of multi-threaded access. This is true of all sequentially consistent architectures.

Checking whether we could use this fast path (if G_ATOMIC_OP_MEMORY_BARRIER_NEEDED was not defined) was a bit of a pain, though, as it required GLib to know the memory consistency model of every architecture. This kind of knowledge is traditionally a compiler’s domain.

So, simplify the fast path by using the compiler-provided atomic intrinsics, and acquire-release memory consistency semantics, if they are available. If they’re not available, fall back to always locking as before.

We definitely need to use __ATOMIC_ACQUIRE in the macro implementation of g_once(). We don’t actually need to make the __ATOMIC_RELEASE changes in gthread.c though, since locking and unlocking a mutex guarantees to insert a full compiler and hardware memory barrier (enforcing sequential consistency). So the __ATOMIC_RELEASE changes are only in there to make it obvious what stores are logically meant to match up with the __ATOMIC_ACQUIRE loads in gthread.h.

References:

Signed-off-by: Philip Withnall withnall@endlessm.com

Fixes: #1323 (closed)

Merge request reports