Skip to content

Avoid quadratic checking of identity-constraints

susematz requested to merge susematz/libxml2:faster-idc into master

key/unique/keyref schema attributes currently use qudratic loops to check their various constraints (that keys are unique and that keyrefs refer to existing keys). That becomes extremely slow if there are many elements with keys. This happens in the wild with e.g. the OVAL XML descriptions of security patches. You need the openscap schemata, and then an example xml file:

% zypper in openscap-utils % wget ftp://ftp.suse.com/pub/projects/security/oval/opensuse.leap.15.1.xml % time xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null opensuse.leap.15.1.xml validates

real 16m59,857s user 16m55,787s sys 0m1,060s

This patch makes libxml use a hash table to avoid the quadratic behaviour. The existing hash table only accepts strings as keys, so we're mostly reusing the canonical representation of key values to derive such strings (with the caveat given in a comment). The alternative would be to rework the hash table code to accept either numbers or free functions as hash workers, but the code is fast enough as is.

With the patch we have this then:

% time LD_LIBRARY_PATH=./libxml2/.libs/ ./libxml2/.libs/xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null opensuse.leap.15.1.xml validates

real 0m3,531s user 0m3,427s sys 0m0,103s

So, a ~300x speedup. This patch survives 'make check' and 'make tests'.

Merge request reports