Skip to content

macos: modernize rendering with CALayer and IOSurface

Christian Hergert requested to merge wip/chergert/macos-iosurface into main

Currently, our OpenGL backend on macOS has two major issues.

  1. Every frame fully swaps buffer contents to the compositor even if we use a scissor rect because there is no equivalent of the EGL "swap buffer with damage" extension on Apple GL.
  2. If the scissor rect is large, we'd still be bringing in a lot of damaged pixels even if 1 was fixed.

This merge request modernizes this significantly by using a tiling of CALayer (up to 128x128 pixels) for which we can connect to an IOSurface (a GPU fronted buffer). That IOSurface can also be bound to a texture/framebuffer in OpenGL or even a Cairo image surface for software rendering.

The tiles allow us to use the same IOSurface rendered across multiple tiles at different offsets so only the tiles overlapping the frames damage region are re-composited by the display server.


This provides a major shift in how we draw both when accelerated OpenGL as well as software rendering with Cairo. In short, it uses tiles of Core Animation's CALayer to display contents from an OpenGL or Cairo rendering so that the window can provide partial damage updates. Partial damage is not generally available when using OpenGL as the whole buffer is flipped even if you only submitted a small change using a scissor rect.

Thankfully, this speeds up Cairo rendering a bit too by using IOSurface to upload contents to the display server. We use the tiling system we do for OpenGL which reduces overall complexity and differences between them.

A New Buffer

GdkMacosBuffer is a wrapper around an IOSurfaceRef. The term buffer was used because 1) surface is already used and 2) it loosely maps to a front/back buffer semantic.

However, it appears that IOSurfaceRef contents are being retained in some fashion (likely in the compositor result) so we can update the same IOSurfaceRef without flipping as long as we're fast. This appears to be what Chromium does as well, but Firefox uses two IOSurfaceRef and flips between them. We would like to avoid two surfaces because it doubles the GPU VRAM requirements of the application.

Changes to Windows

Previously, the NSWindow would dynamically change between different types of NSView based on the renderer being used. This is no longer necessary as we just have a single NSView type, GdkMacosView, which inherits from GdkMacosBaseView just to keep the tedius stuff separate from the machinery of GdkMacosView. We can merge those someday if we are okay with that.

Changes to Views

GdkMacosCairoView, GdkMacosCairoSubView, GdkMacosGLView have all been removed and replaced with GdkMacosView. This new view has a single CALayer (GdkMacosLayer) attached to it which itself has sublayers.

The contents of the CALayer is populated with an IOSurfaceRef which we allocated with the GdkMacosSurface. The surface is replaced when the NSWindow resizes.

Changes to Layers

We now have a dedicated GdkMacosLayer which contains sublayers of GdkMacosTile. The tile has a maximum size of 128x128 pixels in device units.

The GdkMacosTile is partitioned by splitting both the transparent region (window bounds minus opaque area) and then by splitting the opaque area.

A tile has either translucent contents (and therefore is not opaque) or has opaque contents (and therefore is opaque). An opaque tile never contains transparent contents. As such, the opaque tiles contain a black background so that Core Animation will consider the tile's bounds as opaque. This can be verified with "Quartz Debug -> Show opaque regions".

Changes to Cairo

GTK 4 cannot currently use cairo-quartz because of how CSS borders are rendered. It simply causes errors in the cairo_quartz_surface_t backend.

Since we are restricted to using cairo_image_surface_t (which happens to be faster anyway) we can use the IOSurfaceBaseAddress() to obtain a mapping of the IOSurfaceRef in user-space. It always uses BGRA 32-bit with alpha channel even if we will discard the alpha channel as that is necessary to hit the fast paths in other parts of the platform. Note that while Cairo says CAIRO_FORMAT_ARGB32, it is really 32-bit BGRA on little-endian as we expect.

OpenGL will render flipped (Quartz Native Co-ordinates) while Cairo renders with 0,O in the top-left. We could use cairo_translate() and cairo_scale() to reverse this, but it looks like some cairo things may not look quite as right if we do so. To reduce the chances of one-off bugs this continues to draw as Cairo would normally, but instead uses an CGAffineTransform in the tiles and some CGRect translation when swapping buffers to get the same effect.

Changes to OpenGL

To simplify things, removal of all NSOpenGL* related components have been removed and we strictly use the Core GL (CGL*) API. This probably should have been done long ago anyay.

Most examples found in the browsers to use IOSurfaceRef with OpenGL are using Legacy GL and there is still work underway to make this fit in with the rest of how the GSK GL renderer works.

Since IOSurfaceRef bound to a texture/framebuffer will not have a default framebuffer ID of 0, we needed to add a default framebuffer id to the GdkGLContext. GskGLRenderer can use this to setup the command queue in such a way that our IOSurface destination has been glBindFramebuffer() as if it were the default drawable.

This stuff is pretty slight-of-hand, so where things are and what needs flushing when and where has been a bit of an experiment to see what actually works to get synchronization across subsystems.

Efficient Damages

After we draw with Cairo, we unlock the IOSurfaceRef and the contents are uploaded to the GPU. To make the contents visible to the app, we must clear the tiles contents with layer.contents=nil; and then re-apply the IOSurfaceRef. Since the buffer has likely not changed, we only do this if the tile overlaps the damage region.

This gives the effect of having more tightly controlled damage regions even though updating the layer would damage be the whole window (as it is with OpenGL/Metal today with the exception of scissor-rect).

This too can be verified usign "Quartz Debug -> Flash screen udpates".

Frame Synchronized Resize

In GTK 4, we have the ability to perform sizing changes from compute-size during the layout phase. Since the macOS backend already tracks window resizes manually, we can avoid doing the setFrame: immediately and instead do it within the frame clock's layout phase.

Doing so gives us vastly better resize experience as we're more likely to get the size-change and updated-contents in the same frame on screen. It makes things feel "connected" in a way they weren't before.

Some additional effort to tweak gravity during the process is also necessary but we were already doing that in the GTK 4 backend.

Backporting

The design here has made an attempt to make it possible to backport by keeping GdkMacosBuffer, GdkMacosLayer, and GdkMacosTile fairly independent. There may be an opportunity to integrate this into GTK 3's quartz backend with a fair bit of work. Doing so could improve the situation for applications which are damage-rich such as The GIMP.

Merge request reports