Regex interoperability with other XML implementations?
The Wine project implements MSXML using libxml2, and needs a very high degree of compatibility with Microsoft's implementation. Some features, like the extended regex grammar used by MSXML, cannot be done on Wine's side, only within libxml2.
There is also the broader problem of incompatibilities between XML implementations. Eg. What XSD pattern do you write to match a "$" in the input?
Implementation | "\$" | "$" |
---|---|---|
libxml2 | Error | Validates |
Apache Xerces-c | Error | Validates |
.NET Core 3 | Validates | Error |
Mono | Validates | Error |
Java | Validates | Validates |
MSXML 6 | Validates | Validates |
There is no way to write an XSD schema that will match a "$" on every implementation.
Thus it would help if libxml2 supported various extensions and compatibility options, that callers could turn on as needed, to be compatible with XML intended for other implementations. Are you willing to accept patches along those lines? How should such an API be designed?
Further reading:
- XML file uses a non-standard "\/" escape sequence: #152 (closed)
- ecmangen tool needs "\$" escape sequences: https://bugs.winehq.org/show_bug.cgi?id=29685
- Microsoft Office 2013 installer needs "\uNNNN" sequences: https://bugs.winehq.org/show_bug.cgi?id=43581