clutter-stage-cogl: Reduce output latency and reduce missed frames too [performance]
If an update (new frame) had been scheduled already before
_clutter_stage_cogl_presented was called then that means it was
scheduled for the wrong time. Because the
changed since then. And using an
update_time based on an outdated
presentation time results in scheduling frames too early, filling the
buffer queue (triple buffering or worse) and high visual latency.
So if we do receive a presentation event when an update is already
scheduled, remember to reschedule the update based on the newer
last_presentation_time. This way we avoid overfilling the buffer queue
and limit ourselves to double buffering for less visible lag.
last_presentation_time is usually a little in the past, although
sometimes in the future. When it's over 2ms (
sync_delay) in the past
that would trigger the while loop to count up so that the next
update_time is in the future.
The problem with that is for common values of
which are only a few milliseconds ago, incrementing
refresh_interval also means counting past the next physical frame that
we haven't rendered yet. And so mutter would skip a frame.
Proof by example
last_presentation_time = now - 3ms sync_delay = 2ms update_time = last_presentation_time + sync_delay update_time = now - 1ms while (update_time < now) update_time = now - 1ms + 16ms update_time = now + 15ms But you can calculate that: next_presentation_time = last_presentation_time + 16ms next_presentation_time = now - 3ms + 16ms next_presentation_time = now + 13ms update_time > next_presentation_time ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ So we are definitely going to miss that next frame.
last_presentation_time = now - 3ms sync_delay = 2ms update_time = last_presentation_time + sync_delay update_time = now - 1ms while (update_time < (now - 8ms)) # loop is never entered update_time = now - 1ms But you can calculate that: next_presentation_time = last_presentation_time + 16ms next_presentation_time = now - 3ms + 16ms next_presentation_time = now + 13ms So we wake up immediately with 13ms of render time available.
The reason nobody noticed these missed frames very often was because mutter had three accidental workarounds built-in:
Prior to 3.32, the offending code was only reachable in Xorg sessions. It was never reached in Wayland sessions because it hadn't been implemented yet (till e9e4b2b7).
For Xorg sessions, we are accidentally triple buffering (#334). This is a good way to avoid the missed frames, but is also an accident.
sync_delayis presently just high enough (2ms by coincidence is very close to common values of
now - last_presentation_time) to push the
update_timeinto the future in some cases, which avoids entering the while loop. So the same skipped frames problem was also noticed when experimenting with
sync_delay = 0.
Now we modify the while loop to accept an
update_time that's slightly in
the past. So providing there's still at least half a frame (minus
sync_delay) of render time we try to start the next frame immediately.
That is instead of giving up and skipping it.
The master clock already supports update times in the past just fine as it treats them the same as being told to wake up immediately.
The reason why we use
now - refresh_interval/2 instead of
now - refresh_interval is because the former guarantees a minimum
render time that we are unlikely to overshoot. Using the latter would
risk having very little render time available and result in a large two-
frame stutter: the next frame is missed and is displayed one frame late
putting it both spatially and temporally out of position when it is
finally displayed. Only the third frame then would look right. So
that's why we use
now - refresh_interval/2, to guarantee a safe
minimum render time for catch-ups.