refcounting issue/crash via g_source_iter_next
Attach this debugging patch to glib.git, and run this script like:
LD_LIBRARY_PATH=/path/to/glib/build python3 glib-unref-reproducer.py
And you'll get a crash. Here's what's happening
- Add 2 io watch sources
- Run the mainloop in a thread, let it block
- Name the sources 'source1' and 'source2'
- Wakeup the mainloop
- in the mainloop thread, g_main_context_prepare uses g_source_iter_next. It will unlock the context to call source1's check function. The glib debugging patch inserts a sleep after the unlock.
- During the sleep, the script does g_source_remove(source1)
- mainloop finishes sleeping, calls g_source_iter_next. next_source=source2, iter->source=source1
- g_source_iter_next calls g_source_unref_internal on source1, this is the last reference so source1 will be free'd. freeing involves hitting some callbacks and unlocking the context lock. after the unlock, we insert a sleep
- during that sleep, remove source2. source2 is free'd
- mainloop wakes up. next_source still == source2 in g_source_iter_next. but source2 has been free'd so this is a use after free. script crashes trying to g_source_ref soon after.
The sleeps are inserted after the locks are dropped, so this is just making possible threading race conditions easier to trigger. We are hitting this in the libvirt test suite after replacing some custom code with the glib maincontext.
I think the solution is to add a ref on next_source before we invoke g_source_unref_internal. I will send a patch shortly