For some cases two root element node as a result of a html parser.
Hi,
Currently I'm reading the specs for hobby. Sometimes libxml2 create DOM with two html root element node(when fixes the document). looks like a bug. If I read correctly, the DOM only has one root element node per document.1 2 3 4 For the HTML, it is a html element node.
This is the example. it was catched from the wild and I created minimal example.
<!DOCTYPE HTML>
<html></html>
<link href="/example/uri" rel="stylesheet" type="text/css" />
Result of xmllint: xmllint --html --debug example_src_01.html
HTML DOCUMENT
URL=example_src_01.html
standalone=true
DTD(HTML)
ELEMENT html
ELEMENT html
ELEMENT head
ELEMENT link
ATTRIBUTE href
TEXT
content=/example/uri
ATTRIBUTE rel
TEXT
content=stylesheet
ATTRIBUTE type
TEXT
content=text/css
TEXT
content=
The result shows the same indentation level for two html root elements.
The results seem to be reflected as follows:
xmllint --html example_src_01.html --xpath '/html[2]//link'
return an element.
This is another example.
<!DOCTYPE HTML>
<html></html>
example text node
Thank you!
Footnotes
EDIT: remove question
EDIT: I tested with new version in homebrew too. the bug remains.
/usr/local/opt/libxml2/bin/xmllint: using libxml version 21206
compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 HTTP DTDValid HTML C14N Catalog XPath XPointer XInclude Iconv ICU ISO8859X Unicode Regexps Automata Schemas Schematron Modules Debug Zlib
-
https://www.w3.org/TR/DOM-Level-2-Core/#core-ID-1590626202 "Each document contains zero or one doctype nodes, one root element node, and zero or more comments or processing instructions;"
↩ -
https://dom.spec.whatwg.org/#node-trees "Optionally one Element node."
↩