Add GLBuffer implementation w/ persistent mapping
If glBufferStorage() is available, we can replace our usage of glBufferSubData() with persistently mapped storage via glMappedBufferRange().
This has 1 disadvantage:
- It's not supported everywhere, it requires GL 4.4 or GL_EXT_buffer_storage. But every GPU of the last 10 years should implement it. So we check for it and keep the old code. The old code can also be forced via GDK_GL_DISABLE=buffer-storage.
But it has 2 advantages:
-
It is what Vulkan does, so it unifies the two renderers' buffer handling.
-
It is a significant performance boost in use cases with large vertex buffers. Those are pretty rare, but do happen with lots of text at a small font size. An example would be a small font in a maximized VTE terminal or the overview in gnome-text-editor.
A custom benchmark tailored for this problem can be created with:
tests/rendernode-create-tests 1000000 text.node
This creates a node file called "text.node" that draws 1 million text
nodes.
(Creating that test takes a minute or so. A smaller number may be useful
on less powerful hardware than my Intel Tigerlake laptop.)
The difference can then be compared via:
tools/gtk4-rendernode-tool benchmark --runs=20 text.node
and
GDK_GL_DISABLE=buffer-storage tools/gtk4-rendernode-tool benchmark --runs=20 text.node
Here's a few benchmark numbers from my machines:
computer | size | GL before | GL after | Vulkan before | Vulkan after |
---|---|---|---|---|---|
TigerLake | 1M | 1.1s | 0.8s | 1.0s | 1.0s |
Radeon RX6550XT | 1M | 1.6s | 0.7s | 2.5s | 0.9s |
Radeon RX6950XT | 1M | 0.36s | 0.3s | 1.55s | 0.6s |
Radeon integrated | 1M | 1.7s | 1.2s | 2.8s | 1.1s |
RPi 4 | 100k | 2.0s | 1.9s |
And here's the difference in a flamegraph (top is after, bottom is before):
Related: !7021 (closed)