Skip to content

Show how much reftests differ from reference, and allow ignoring small differences

Simon McVittie requested to merge wip/smcv/reftest-tolerance into master

Related to #1789 (closed).

I've been trying to get GTK 4.0.x Debian packages to pass tests, and I'm having trouble with reftest failures, some of which are architecture-specific and some of which might even be intermittent. Unfortunately, some of them are reproducible on our official autobuilders (on which non-sysadmins can't debug noninteractively) but not reproducible during interactive testing, so it's a slow process.

This MR makes failing reftests log how many pixels were different, and by how much. Here's an example of one of one of the tests that is expected to fail at the moment:

# random seed: R02Sa436276c12f68d5df662f24427d91e46
# GLib-GIO-DEBUG: _g_io_module_get_default: Found default implementation local (GLocalVfs) for ‘gio-vfs’
1..1
# Start of home tests
# Start of smcv tests
# Start of gtk4-i386 tests
# Start of testsuite tests
# Start of reftests tests
# Attention: globally setting default text direction to LTR
# Attention: globally setting default text direction to LTR
# Storing test result image at /home/smcv/gtk4-i386/debian/build/deb/testsuite/reftests/output/x11/label-sizing.out.png
# Storing test result image at /home/smcv/gtk4-i386/debian/build/deb/testsuite/reftests/output/x11/label-sizing.ref.png
# 1566 (out of 447066) pixels differ from reference by up to 200 levels
# Storing test result image at /home/smcv/gtk4-i386/debian/build/deb/testsuite/reftests/output/x11/label-sizing.diff.png
not ok 1 /home/smcv/gtk4-i386/testsuite/reftests/label-sizing.ui
# End of reftests tests
# End of testsuite tests
# End of gtk4-i386 tests
# End of smcv tests
# End of home tests

and one that fails on i386 but probably only because of rounding errors:

# random seed: R02S78d4d46282d84f1df4a9361968c73f8e
# GLib-GIO-DEBUG: _g_io_module_get_default: Found default implementation local (GLocalVfs) for ‘gio-vfs’
1..1
# Start of home tests
# Start of smcv tests
# Start of gtk4-i386 tests
# Start of testsuite tests
# Start of reftests tests
# Storing test result image at /home/smcv/gtk4-i386/debian/build/deb/testsuite/reftests/output/x11/label-attribute-preference.out.png
# Storing test result image at /home/smcv/gtk4-i386/debian/build/deb/testsuite/reftests/output/x11/label-attribute-preference.ref.png
# 32 (out of 1760) pixels differ from reference by up to 1 levels
# Storing test result image at /home/smcv/gtk4-i386/debian/build/deb/testsuite/reftests/output/x11/label-attribute-preference.diff.png
not ok 1 /home/smcv/gtk4-i386/testsuite/reftests/label-attribute-preference.ui
# End of reftests tests
# End of testsuite tests
# End of gtk4-i386 tests
# End of smcv tests
# End of home tests

It also adds a mechanism to allow a small amount of "fuzz" in the comparisons, to account for things like the i387 FPU's extended precision giving slightly different answers, by creating a GKeyFile with a name like testsuite/reftests/label-attribute-preference.keyfile or testsuite/gsk/compare/repeat-texture.keyfile.


  • !3203 (merged) (see that MR for details)

  • reftest_compare_surfaces: Report how much the images differ

    Some of the reftests don't produce identical results on all architectures, but do produce results that are visually indistinguishable. Report how many pixels differ and by how much, so we can get an idea of what's a rounding error and what's a serious problem.

  • reftests: Allow minor differences to be tolerated

    Based on an earlier patch by Michael Biebl, as used in Debian's GTK 3 packaging, with additional inspiration from librsvg's reftests.

    Each .ui or .node reftest can have an accompanying .keyfile file like this:

      [reftest]
      accepted-diff-level=2
      accepted-diff-pixels=100
      tolerated-diff-level=20
      tolerated-diff-pixels=1000

    If the number of pixels that differ from the reference is no more than accepted-diff-pixels, and each channel in each of those pixels differs from the reference by no more than accepted-diff-level, then we consider that to be a full success, and don't even save the .diff.png for analysis.

    If that check fails, but the number of pixels that differ is no more than tolerated-diff-pixels and the differences are no more than tolerated-diff-level, then we treat it as a success with warnings, save the .diff.png for analysis, and use g_test_incomplete() to record the test-case as "TODO".

Edited by Simon McVittie

Merge request reports