WIP: clutter-frame-clock: Scale up to triple buffering when the GPU is behind
If presentation events start arriving late (by whole frame intervals!) due to the GPU being overwhelmed then it's already too late to fix it and we would get stuck in a negative feedback loop of dropping frames and staying at a low GPU frequency. Because frame rate decimation increases GPU idle time, which makes the driver think the current frequency is adequate.
To avoid that problem we now plan ahead what the ideal next dispatch time after the current one would be. And we stick to that time (the native refresh rate) regardless of whether the previous frame is finished. This triple buffering allows mutter to reach 100% GPU utilization (if required) which then encourages the GPU to scale up frequency (if required).
Performance measurements at 4K 60Hz, on an i7-7700 with GPU base frequency of 350MHz, max frequency 1150MHz:
Test | FPS Before | MHz Before | FPS After | MHz After |
---|---|---|---|---|
1 small window | 60 | 350 | 60 | 350 |
1 big window | 59 | 364 | 60 | 364 |
1 big window overview | 30 | 363 | 60 | 474 |
2 big windows overview | 30 | 930 | 60 | 1150 |
4 big windows overview | 20 | 1150 | 30 | 1150 |
This shows frame rates now only drop after the maximum GPU frequency is reached.
Optional TODOs:
-
More test cases. -
Add support in the native backend, since the feature is only enabled on Xorg right now.
Merge request reports
Activity
mentioned in merge request !1378 (closed)
- Resolved by Daniel van Vugt
[...] stuck in a negative feedback loop of dropping frames and staying at a low GPU frequency.
That could indicate a Mesa / kernel / BIOS issue. See e.g. my similar https://gitlab.freedesktop.org/drm/amd/-/issues/1146 , which turned out to be a BIOS issue.
mutter shouldn't need to do anything special for the clocks to be scaled up appropriately.
- Resolved by Daniel van Vugt
Many Android devices use explicit notifications to signal interactivity to the kernel and adjust frequency scaling. Maybe this approach would be suitable for desktop Linux, too? Even if GPU and CPU power management reacts fairly quickly, an explicit notification will always be quicker.
That said, this issue seems to point to a bad flushing strategy in gnome-shell. Ideally, you want to flush as early as possible (to have your rendering finished ASAP) and as rarely as possible (to reduce overhead), and that's quite hard to do, as these goals kind of contradict each other. :)
Edited by Grigori Goronzy
added 8 commits
-
9e3ed715...c7d14244 - 4 commits from branch
GNOME:master
- 1d26350b - clutter-frame-clock: Remove ClutterFrameClockState
- d1c27daf - clutter-frame-clock: Remember the refresh interval
- d936e887 - cogl: Advertise triple buffering support
- 5b3f0a95 - clutter-frame-clock: Scale up to triple buffering when the GPU is behind
Toggle commit list-
9e3ed715...c7d14244 - 4 commits from branch
added 1 commit
- 36f4ff24 - clutter-frame-clock: Scale up to triple buffering when the GPU is behind
mentioned in commit vanvugt/mutter@60ecc4af
mentioned in commit vanvugt/mutter@6b9ad520
mentioned in commit vanvugt/mutter@40f53945
mentioned in commit vanvugt/mutter@2a48f2a5
mentioned in commit vanvugt/mutter@e1367ea2
added 1 commit
- e1367ea2 - clutter-frame-clock: Scale up to triple buffering when the GPU is behind
mentioned in issue gnome-shell#2697 (closed)
mentioned in commit vanvugt/mutter@445740f0
mentioned in commit vanvugt/mutter@f1877028
- Resolved by Daniel van Vugt
The e1367ea2 commit log claims:
Latency is only ever suboptimal in cases that were already skipping frames, which are cases where latency was already suboptimal. So this should not increase latency, it should only increase frame rates.
While that's true for the cases which achieve full frame-rate with triple buffering, let's look at latency for the last case:
Test FPS Before FPS After 4 big windows overview 20 30 The specified latency is approximately how much time passes between when mutter works on a frame and when the GPU display hardware starts scanning it out.
Latency Before this MR 48 ms (render time 14.6 ms + 2 missed refresh cycles 33.3 ms) After this MR 65 ms (render time 14.6 ms + 1 refresh cycle until previous frame is presented 16.6 ms + 2 refresh cycles until this frame is presented 33.3 ms) With more sophisticated frame scheduling, it might be possible to prevent higher latency while achieving higher frame-rates. But as-is that's not always the case.
Edited by Michel Dänzer