Try to avoid converting twice in Cogl read-pixels
These patches are a result of trying to make the CPU copy path faster for GPU-less secondary GPUs (DisplayLink devices). !278 (merged) makes the CPU copy path use Cogl read-pixels instead of a simple glReadPixels
, and this MR tries to make Cogl read-pixels faster. There is no practical dependency to !278 (merged) though, this MR could benefit other Cogl read-pixels users too although I have no benchmarks of those.
The main goal here is to get as close to a pure memcpy
in the actual glReadPixels
call as possible, assuming that that will be the fastest. That can never be completely achieved if the read buffer has an X-channel instead of an alpha channel, because there does not seem to be way to tell glReadPixels
that it can leave garbage in the A/X-channel. However, I think this MR is still a worthwhile step towards that goal even if it doesn't have immediate effects.
If one can arrange the read buffer format and the Cogl read-pixels destination buffer format to match, then the read-pixels function as a whole should be a simple non-converting copy. This was thwarted by two things, fixed by the two patches: an old workaround for GL_BGRA
reads, and a hardcoded intermediate pixel format that may not match either the read or destination formats. Both things lead to the same: glReadPixels
does one pixel format conversion, and then Cogl read-pixels does another conversion on the CPU.
On an Intel Haswell Desktop machine, doing a read of 1080p frame, read buffer is DRM_FORMAT_XRGB8888
:
- destination format
DRM_FORMAT_XBGR8888
, before and after: 5.8 ms - destination format
DRM_FORMAT_XRGB8888
, before 12 ms, after 9.0 ms
The destination format is currently hardcoded to DRM_FORMAT_XBGR8888
in Mutter, but I intend to change that by looking at IN_FORMATS
DRM property instead, hence this optimization will likely become relevant in the CPU copy path.
If I hack the read buffer format to be DRM_FORMAT_ARGB8888
and the destination format is DRM_FORMAT_XRGB8888
, the copy will take just 4.0 ms.
However, the hack broke the display of primary GPU outputs by never showing anything but a frozen image of fbcon, and I do not yet know why. I also do not know if a GBM bo could be used in scanout and as a source for the CPU copy path at the same time, so I decided to stop here with the read-pixels optimizations for now and look for other alternatives. I am still interested in the read-pixels path, because it will be the fallback path if other future options fail.