GQuark leaks and fuzzing
I've been trying to set up a fuzzer for one of glib's clients, and discovered that gquark is intentionally leaking here: https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/gquark.c#L298
This is slightly problematic for fuzzing: I'm actually trying to write a fuzzer which can run in the oss-fuzz environment, which has somewhat strict requirements:
- oss-fuzz uses libfuzzer, which performs leak-checking after every test case. libfuzzer lets you disable leak checking when running locally, but you can't do that in the oss-fuzz environment.
- It's also possible to suppress specific leaks when running locally - but that's also not allowed in oss-fuzz. (- The app that I'm testing is also a heavy user of quarks - but that's something I can fix within the client app.)
I'm not entirely sure how to solve this problem yet, the options seem to be:
- Provide a locking but leak-free implementation that can be used for fuzzing, probably behind a define? We most likely need to recompile glib for oss-fuzz anyways, therefore a fuzzing-only define would be fine. (We may need to do this anyway to disable the slab allocator - because the slab allocator also confuses leak-checking, and similarly we can't just set G_SLICE=always-malloc in the oss-fuzz environment.). But that's expensive to maintain?
- Bypass GQuark inside the client application when producing a fuzzer build - also annoying and expensive to maintain, and doesn't guarantee we won't leak (dependencies could be using GQuark.)
- Change GQuark to always be leak free... but is that possible without performance regressing?
I decided to look more closely at why this gquark leak was introduced in the first place: the goal was to improve read performance: 7ae5e9c2
Further details are in the bug: https://bugzilla.gnome.org/show_bug.cgi?id=650458
That bug includes a benchmark (attachment on this comment) - which shows that the impact on multi-threaded reads was being measured: https://bugzilla.gnome.org/show_bug.cgi?id=650458#c15
Looking at the implementation from before these changes: the problem was that a single G_LOCK was used to protect both reads and writes. In other words: even parallel reads would block each other.
GLib now offers GRWLock (which I don't think existed when the leak was introduced), which could help us avoid parallel reads from blocking each other while still allowing locking - so I'd like to investigate if that might offer reasonable performance. It will obviously be a little more expensive than the current lockless implementation, but if we care primarily about optimising reads, then it might be good enough? (I've prepared an initial prototype which doesn't show any performance regression, but the numbers aren't likely to be meaningful because it's an unoptimised ASAN build - but I'll try to do a more realistic comparison in future.)
But I'm also curious if anyone has better ideas, hence I've filed this issue to share my thoughts while I experiment with a read-write lock.