ilbxml2 regexp error parsing XML schema using subtracted character classes
libxml2-2.9.12 is unable to process a certain XML schema file which uses a subtracted character class.
Sample XML schema: example.xsd
Sample XML document: example.xml
xmllint reports several regexp errors for the schema:
% xmllint -noout -schema example.xsd example.xml
regexp error : failed to compile: Expecting the end of a char range
regexp error : failed to compile: xmlFAParseCharClass: ']' expected
regexp error : failed to compile: xmlFAParseRegExp: extra characters
example.xsd:23: element pattern: Schemas parser error : Element '{http://www.w3.org/2001/XMLSchema}pattern': The value '[\p{IsBasicLatin}\p{IsLatin-1Supplement}\p{IsLatinExtended-A}€ȘșȚț-[\p{C}]]+' of the facet 'pattern' is not a valid regular expression.
WXS schema example.xsd failed to compile
Workaround is to change the pattern to [\p{IsBasicLatin}\p{IsLatin-1Supplement}\p{IsLatinExtended-A}€ȘșȚțA-Z-[\p{C}]]+
Suggested patch
--- libxml2-2.9.12-orig/xmlregexp.c 2021-05-13 14:53:51.000000000 +0200
+++ libxml2-2.9.12/xmlregexp.c 2022-04-21 18:42:34.000000000 +0200
@@ -5100,7 +5100,7 @@
}
NEXTL(len);
cur = CUR;
- if ((cur != '-') || (NXT(1) == ']')) {
+ if ((cur != '-') || (NXT(1) == '[')) {
xmlRegAtomAddRange(ctxt, ctxt->atom, ctxt->neg,
XML_REGEXP_CHARVAL, start, end, NULL);
return;