gthread: Move thread _impl functions to static inlines for speed
The changes made in commit bc59e28b
(issue #3399 (closed)) fixed introspection of the GThread API. However, they
introduced a trampoline in every threading function. So with those
changes applied, the disassembly of g_mutex_lock()
(for example) was:
0x7ffff7f038b0 <g_mutex_lock> jmp 0x7ffff7f2f440 <g_mutex_lock_impl>
0x7ffff7f038b5 data16 cs nopw 0x0(%rax,%rax,1)
i.e. It jumps straight to the _impl
function, even with an optimised
build. Since g_mutex_lock()
(and various other GThread functions) are
frequently run hot paths, this additional jmp
to a function which has
ended up in a different code page is a slowdown which we’d rather avoid.
So, this commit reworks things to define all the _impl
functions as
G_ALWAYS_INLINE static inline
(which typically expands to
__attribute__((__always_inline__)) static inline
), and to move them
into the same compilation unit as gthread.c
so that they can be
inlined without the need for link-time optimisation to be enabled.
It makes the code a little less readable, but not much worse than what
commit bc59e28b already did. And perhaps
the addition of the inline
decorations to all the _impl
functions
will make it a bit clearer what their intended purpose is
(platform-specific implementations).
After applying this commit, the disassembly of g_mutex_lock()
successfully contains the inlining for me:
=> 0x00007ffff7f03d80 <+0>: xor %eax,%eax
0x00007ffff7f03d82 <+2>: mov $0x1,%edx
0x00007ffff7f03d87 <+7>: lock cmpxchg %edx,(%rdi)
0x00007ffff7f03d8b <+11>: jne 0x7ffff7f03d8e <g_mutex_lock+14>
0x00007ffff7f03d8d <+13>: ret
0x00007ffff7f03d8e <+14>: jmp 0x7ffff7f03610 <g_mutex_lock_slowpath>
I considered making a similar change to the other APIs touched in #3399 (closed) (GContentType, GAppInfo, GSpawn), but they are all much less performance critical, so it’s probably not worth making their code more complex for that sake.
Signed-off-by: Philip Withnall pwithnall@gnome.org
Fixes: #3417 (closed)
Closes #3417 (closed)