Related to #1789 (closed).
I've been trying to get GTK 4.0.x Debian packages to pass tests, and I'm having trouble with reftest failures, some of which are architecture-specific and some of which might even be intermittent. Unfortunately, some of them are reproducible on our official autobuilders (on which non-sysadmins can't debug noninteractively) but not reproducible during interactive testing, so it's a slow process.
This MR makes failing reftests log how many pixels were different, and by how much. Here's an example of one of one of the tests that is expected to fail at the moment:
# random seed: R02Sa436276c12f68d5df662f24427d91e46 # GLib-GIO-DEBUG: _g_io_module_get_default: Found default implementation local (GLocalVfs) for ‘gio-vfs’ 1..1 # Start of home tests # Start of smcv tests # Start of gtk4-i386 tests # Start of testsuite tests # Start of reftests tests # Attention: globally setting default text direction to LTR # Attention: globally setting default text direction to LTR # Storing test result image at /home/smcv/gtk4-i386/debian/build/deb/testsuite/reftests/output/x11/label-sizing.out.png # Storing test result image at /home/smcv/gtk4-i386/debian/build/deb/testsuite/reftests/output/x11/label-sizing.ref.png # 1566 (out of 447066) pixels differ from reference by up to 200 levels # Storing test result image at /home/smcv/gtk4-i386/debian/build/deb/testsuite/reftests/output/x11/label-sizing.diff.png not ok 1 /home/smcv/gtk4-i386/testsuite/reftests/label-sizing.ui # End of reftests tests # End of testsuite tests # End of gtk4-i386 tests # End of smcv tests # End of home tests
and one that fails on i386 but probably only because of rounding errors:
# random seed: R02S78d4d46282d84f1df4a9361968c73f8e # GLib-GIO-DEBUG: _g_io_module_get_default: Found default implementation local (GLocalVfs) for ‘gio-vfs’ 1..1 # Start of home tests # Start of smcv tests # Start of gtk4-i386 tests # Start of testsuite tests # Start of reftests tests # Storing test result image at /home/smcv/gtk4-i386/debian/build/deb/testsuite/reftests/output/x11/label-attribute-preference.out.png # Storing test result image at /home/smcv/gtk4-i386/debian/build/deb/testsuite/reftests/output/x11/label-attribute-preference.ref.png # 32 (out of 1760) pixels differ from reference by up to 1 levels # Storing test result image at /home/smcv/gtk4-i386/debian/build/deb/testsuite/reftests/output/x11/label-attribute-preference.diff.png not ok 1 /home/smcv/gtk4-i386/testsuite/reftests/label-attribute-preference.ui # End of reftests tests # End of testsuite tests # End of gtk4-i386 tests # End of smcv tests # End of home tests
It also adds a mechanism to allow a small amount of "fuzz" in the comparisons, to account for things like the i387 FPU's extended precision giving slightly different answers, by creating a
GKeyFile with a name like
!3203 (merged) (see that MR for details)
reftest_compare_surfaces: Report how much the images differ
Some of the reftests don't produce identical results on all architectures, but do produce results that are visually indistinguishable. Report how many pixels differ and by how much, so we can get an idea of what's a rounding error and what's a serious problem.
reftests: Allow minor differences to be tolerated
Based on an earlier patch by Michael Biebl, as used in Debian's GTK 3 packaging, with additional inspiration from librsvg's reftests.
Each .ui or .node reftest can have an accompanying .keyfile file like this:
[reftest] accepted-diff-level=2 accepted-diff-pixels=100 tolerated-diff-level=20 tolerated-diff-pixels=1000
If the number of pixels that differ from the reference is no more than accepted-diff-pixels, and each channel in each of those pixels differs from the reference by no more than accepted-diff-level, then we consider that to be a full success, and don't even save the .diff.png for analysis.
If that check fails, but the number of pixels that differ is no more than tolerated-diff-pixels and the differences are no more than tolerated-diff-level, then we treat it as a success with warnings, save the .diff.png for analysis, and use g_test_incomplete() to record the test-case as "TODO".