[GTK 3] testsuite: Avoid using should_fail (!5249) · Merge requests · GNOME / gtk

Simon McVittie requested to merge wip/smcv/xfail-gtk3 into gtk-3-24 Nov 23, 2022

testsuite: Avoid using should_fail

There are two possible interpretations of "expected failure": either the test must fail (exactly the inverse of an ordinary test, with success becoming failure and failure becoming success), or the test may fail (with success intended, but failure possible in some environments). Autotools had the second interpretation, which seems more useful in practice, but Meson has the first.

In GTK 3.24.35, if the environment is such that the label-sizing.ui reftest happens to be successful, the overall result of the test suite is failure. This seems unlikely to have been the intention.

Instead of using should_fail, put the tests in one of two new suites: "flaky" is intended for tests that succeed or fail unpredictably according to the test environment or chance, while "failing" is for tests that ought to succeed but currently never do as a result of a bug or missing functionality. With a sufficiently new version of Meson, the flaky and failing tests are not run by default, but can be requested with a command like:
```
  meson test --setup=unstable_tests --suite=flaky --suite=failing
```
This arrangement is inspired by glib!2987 (merged), which was contributed by Marco Trevisan.
testsuite: Try enabling a11y tests, other than those known to be unstable

At least some of the tests implemented via the accessibility-dump executable are known to be unstable, but the tests based on separate executables (tree-performance.c, etc.) have been reasonably consistently passing on ci.debian.net for several years, so hopefully they are also reliable enough for upstream CI and we don't need to mark them as flaky?
testsuite: Don't create .test files for flaky or failing tests

These tests can be run manually, but are not suitable for use as an acceptance test, so let's not make frameworks like Debian's autopkgtest run these when they run ginsttest-runner in the most obvious way.

a11ytests.test doesn't seem to be reliable enough to be used as a QA acceptance criterion, and has been disabled as a build-time test in both Gitlab-CI and Debian since 2019. a11ystate.test is not set up to be run at build time at all, and has been marked as flaky on ci.debian.net since 2018.

The rest of the testsuite/a11y directory seems to have been reliable in practice, at least on ci.debian.net, so try leaving them enabled as installed-tests.

In principle this could be made finer-grained by having a separate .test file and a separate Meson test() for each .ui file, but that would require more active maintenance of GTK 3.

GTK 3 version of !5248 (merged). I hope this will resolve #5357, but I haven't verified that yet.

Edited Nov 24, 2022 by Simon McVittie

[GTK 3] testsuite: Avoid using should_fail

Merge request reports