Try to avoid converting twice in Cogl read-pixels
These patches are a result of trying to make the CPU copy path faster for GPU-less secondary GPUs (DisplayLink devices). !278 (merged) makes the CPU copy path use Cogl read-pixels instead of a simple
glReadPixels, and this MR tries to make Cogl read-pixels faster. There is no practical dependency to !278 (merged) though, this MR could benefit other Cogl read-pixels users too although I have no benchmarks of those.
The main goal here is to get as close to a pure
memcpy in the actual
glReadPixels call as possible, assuming that that will be the fastest. That can never be completely achieved if the read buffer has an X-channel instead of an alpha channel, because there does not seem to be way to tell
glReadPixels that it can leave garbage in the A/X-channel. However, I think this MR is still a worthwhile step towards that goal even if it doesn't have immediate effects.
If one can arrange the read buffer format and the Cogl read-pixels destination buffer format to match, then the read-pixels function as a whole should be a simple non-converting copy. This was thwarted by two things, fixed by the two patches: an old workaround for
GL_BGRA reads, and a hardcoded intermediate pixel format that may not match either the read or destination formats. Both things lead to the same:
glReadPixels does one pixel format conversion, and then Cogl read-pixels does another conversion on the CPU.
On an Intel Haswell Desktop machine, doing a read of 1080p frame, read buffer is
- destination format
DRM_FORMAT_XBGR8888, before and after: 5.8 ms
- destination format
DRM_FORMAT_XRGB8888, before 12 ms, after 9.0 ms
The destination format is currently hardcoded to
DRM_FORMAT_XBGR8888 in Mutter, but I intend to change that by looking at
IN_FORMATS DRM property instead, hence this optimization will likely become relevant in the CPU copy path.
If I hack the read buffer format to be
DRM_FORMAT_ARGB8888 and the destination format is
DRM_FORMAT_XRGB8888, the copy will take just 4.0 ms.
However, the hack broke the display of primary GPU outputs by never showing anything but a frozen image of fbcon, and I do not yet know why. I also do not know if a GBM bo could be used in scanout and as a source for the CPU copy path at the same time, so I decided to stop here with the read-pixels optimizations for now and look for other alternatives. I am still interested in the read-pixels path, because it will be the fallback path if other future options fail.