XML parser (xmllint) fails to detect XML invalid character 0x0
Here is the demonstration of the problem with parsing XML, which contains an invalid (for XML) character
- Trailing 0x0 (after last closed element):
$ hexdump -C test_0_trailing.xml
00000000 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31 |<?xml version="1|
00000010 2e 30 22 20 65 6e 63 6f 64 69 6e 67 3d 22 75 74 |.0" encoding="ut|
00000020 66 2d 38 22 3f 3e 0a 3c 72 6f 6f 74 3e 3c 2f 72 |f-8"?>.<root></r|
00000030 6f 6f 74 3e 00 |oot>.|
00000035
$ xmllint --noout test_0_trailing.xml
$ echo $?
0
- 0x0 is inside the element:
$ hexdump -C test_0_inside.xml
00000000 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31 |<?xml version="1|
00000010 2e 30 22 20 65 6e 63 6f 64 69 6e 67 3d 22 75 74 |.0" encoding="ut|
00000020 66 2d 38 22 3f 3e 0a 3c 72 6f 6f 74 3e 00 3c 2f |f-8"?>.<root>.</|
00000030 72 6f 6f 74 3e |root>|
00000035
$ xmllint --noout test_0_inside.xml
test_0_inside.xml:2: parser error : Premature end of data in tag root line 2
$ echo $?
1
The parser failed, but the error message is not adequate. Seems it treated 0x0 character as the end of the string buffer instead of incorrect character and did not reach closing element .
Here is the similar execution of saxon query parser, which detects error better (though it should complain about invalid character used instead of trailing content in test_0_trailing.xml test:
- Trailing 0x0 (after last closed element):
$ export CLASSPATH=./saxon9ee/saxon9ee.jar:./resolver-1.2/resolver.jar; /usr/bin/java net.sf.saxon.Query -qs:. -s:test_0_inside.xml
Error on line 2 column 7 of test_0_inside.xml:
SXXP0003: Error reported by XML parser: An invalid XML character (Unicode: 0x0) was found
in the element content of the document.
Query processing failed: org.xml.sax.SAXParseException; systemId: file:/pmc/tmp/martin/test_0_inside.xml; lineNumber: 2; columnNumber: 7; An invalid XML character (Unicode: 0x0) was found in the element content of the document.
$ echo $?
2
- 0x0 is inside the element:
$ export CLASSPATH=/pmc/JAVA/saxon9ee/saxon9ee.jar:/pmc/JAVA/xml-commons-resolver-1.2/resolver.jar; /usr/bin/java net.sf.saxon.Query -qs:. -s:test_0_trailing.xml
Error on line 2 column 14 of test_0_trailing.xml:
SXXP0003: Error reported by XML parser: Content is not allowed in trailing section.
Query processing failed: org.xml.sax.SAXParseException; systemId: file:/pmc/tmp/martin/test_0_trailing.xml; lineNumber: 2; columnNumber: 14; Content is not allowed in trailing section.
$ echo $?
2
- here is the properly failed parsing error (with invalid XML character 0x1)
$ hexdump -C test_0_inside.xml
00000000 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31 |<?xml version="1|
00000010 2e 30 22 20 65 6e 63 6f 64 69 6e 67 3d 22 75 74 |.0" encoding="ut|
00000020 66 2d 38 22 3f 3e 0a 3c 72 6f 6f 74 3e 01 3c 2f |f-8"?>.<root>.</|
00000030 72 6f 6f 74 3e |root>|
00000035
$ xmllint --noout test_0_inside.xml
test_0_inside.xml:2: parser error : PCDATA invalid Char value 1
<root></root>
$ echo $?
1
- and trailing version with 0x1
$ hexdump -C test_0_trailing.xml
00000000 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31 |<?xml version="1|
00000010 2e 30 22 20 65 6e 63 6f 64 69 6e 67 3d 22 75 74 |.0" encoding="ut|
00000020 66 2d 38 22 3f 3e 0a 3c 72 6f 6f 74 3e 3c 2f 72 |f-8"?>.<root></r|
00000030 6f 6f 74 3e 01 |oot>.|
00000035
$ xmllint --noout test_0_trailing.xml
test_0_trailing.xml:2: parser error : Extra content at the end of the document
<root></root>
^
$ echo $?
1
Edited by kolotev