Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
L
libxml2
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 69
    • Issues 69
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 9
    • Merge Requests 9
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI / CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • GNOME
  • libxml2
  • Issues
  • #194

Closed
Open
Opened Oct 12, 2020 by lafiona@lafiona

XPath evaluator (xmlXPathEvalExpression) errors when passed expressions containing Unicode characters supported by XML 1.0 Fifth Edition

  1. The XML parser xmlParseFile supports all Unicode characters supported by the XML 1.0 Fifth Edition Specification, while the XPath evaluator xmlXPathEvalExpression only supports characters up to the XML 1.0 Fourth Edition Specification. XML parser support for Fifth Edition characters was added in this commit.

  2. The XPath 1.0 Specification indicates that character support is defined by NCNAME, of the Namespaces in XML 1.0 Third Edition Specification, which points to Name, of the XML 1.0 Fifth Edition Specification. The XML 1.0 Fifth Edition Name production states:

Almost all characters are permitted in names, except those which either are or reasonably could be used as delimiters. The intention is to be inclusive rather than exclusive, so that writing systems not yet encoded in Unicode can be used in XML names.

The specification cites the Unicode 5.0 specification. This seems to imply that a compliant XPath evaluator should support characters at least up to those contained in Unicode 5.0.

It would be helpful to have consistent Unicode support between the XML parser and the XPath evaluator.

Reproduction

The character Ꮂ U+13B2 CHEROKEE LETTER HV was introduced in Unicode 3.0. While the XML parser successfully reads a file that contains the Ꮂ character, the XPath evaluator will error.

The following code demonstrates that the XPath evaluator will error when passed an XPath expression containing the character.

#include <libxml/xpath.h>

int main(int argc, char* argv[]) {
    xmlDocPtr doc;
    xmlXPathContextPtr xpathContext;
    xmlXPathObjectPtr xpathObj;

    xmlInitParser();

    const xmlChar* xpathExpression = "/Ꮂ";

    const xmlChar* filename = "unicode.xml";

    doc = xmlParseFile(filename);

    xpathContext = xmlXPathNewContext(doc);

    // This line will throw an error.
    xpathObj = xmlXPathEvalExpression(xpathExpression, xpathContext);

    xmlCleanupParser();

    return 0;
}

This results in the following error:

XPath error : Invalid expression
/Ꮂ[1]
 ^

Operating system: Debian 10 Buster libxml2 version: 2.9.8

Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: GNOME/libxml2#194