Skip to content

Add g_main_context_new_with_flags() and ownerless polling option

It fixes a race. g_source_attach() had the following check to ensure a loop blocked on poll() would wakeup.

if (do_wakeup && context->owner && context->owner != G_THREAD_SELF)
  g_wakeup_signal (context->wakeup);

However it doesn't contemplate an implementation where poll()ing is a non-blocking operation that will be scheduled while the thread is released to perform other tasks. This scenario opens up several different possibilities where the condition would fail to hold true. I experienced two of such races.

The first race pertains to a mono-threaded application. Do keep in mind that integrating GLib to a foreign event loop will make GLib act as a slave in the new event loop. When you post a new work unit to execute in the thread managed by the foreign event loop, you don't use g_main_context_invoke(). In fact the only reason to integrate GMainContext in a foreign event loop is to make the two of them communicate. So from time to time, the foreign event loop will execute callbacks that manipulate the GMainContext loop. An illustration follows.

// in this callback we translate an event from the foreign event loop
// to an event in the GMainContext event loop (that runs in the same
// thread)
static void my_event_loop_callback(void* data)
  GMainContext* ctx = /* ... */;
  // ...
  g_source_attach(source, ctx);

int main()
  // ...
  my_event_loop_invoke(my_event_loop_callback, data);
  // ...

  // this function has all mechanisms in place to run the foreign
  // event loop and the hooks to call
  // g_main_context_{prepare,query,check,dispatch}

In this case, you would have the following series of calls:

  1. g_main_context_prepare()
  2. g_main_context_query()
  3. A callback to my_event_loop is registered when any fd on the set is ready or the timeout is reached.
  4. The thread is released to perform other tasks.
  5. One of the tasks executed wishes to communicate with my_event_loop and enters my_event_loop_callback.
  6. g_source_attach() is called.
  7. g_source_attach() detects do_wakeup=TRUE, context->owner != NULL, and context->owner == G_THREAD_SELF so g_wakeup_signal() is skipped.
  8. None of the fds on the GLib poll() set becomes ready nor the GLib timeout expires. The my_event_loop callback that would call g_main_context_check() is never executed. Deadlock.

A shallow analysis will fail to detect the race here. The explanation seems to showcase a scenario that will deterministically fail with a deadlock every time. However do keep in mind that my_event_loop_callback could be invoked before or after g_main_context_prepare(). There is an event race here. Furthermore, some GLib libraries such as GDBus will initialize objects from extra threads (GAsyncInitable interface) and invoke the result on the original thread when ready (g_source_attach() will eventually be called). Now you have scenarios closer to standard race examples.

The other scenario where a race would manifest happens in a multi-threaded application that has a concurrency design similar to the actor model. No actor executes in two threads simultaneously, but it's not guaranteed that it'll always wake-up in the same thread. It'd perform steps 1-4 just as in the previous example, but before thread control is returned to the pool, it'd call g_main_context_release(). Now g_source_attach() would skip g_wakeup_signal() for a different reason:

  1. g_source_attach() detects do_wakeup=TRUE, context->owner == NULL so g_wakeup_signal() is skipped.
  2. Same as before.

Certainly there are other concurrency designs where this optimization would cause a deadlock, but all of them have origin in the same place: the optimization assumes the poll() implementation is a blocking operation and the thread will never be released to perform other tasks (possibly involving GLib calls) while result is not ready. They share not only the same problem, but also the same solution: do not make assumptions and just call g_wakeup_signal().

This patch implements this solution by introducing a new constructor to GMainContext objects that accept a flags argument. One of the flags is G_MAIN_CONTEXT_OWNERLESS_POLLING. This flag will force a call to g_wakeup_signal() and fix the race on foreign event loops. The mechanism is introduced in such a way that you can only set this option on new event loops, but never after creation. The reason to prevent changing this option after creation is to avoid other races that would lead to event loss. Construction is the only proper time to set this flag.

The implementation design means we do not change any semantics for current working code. The old constructor is basically the same. If you don't set the new flag, the code won't enter in different branches and current behavior won't be affected. The patch is small and easy to follow too.

Edited by Vinícius dos Santos Oliveira

Merge request reports