Nesting behaviour of HTML_PARSE_NOIMPLIED
Documentation says: HTML_PARSE_NOIMPLIED= 1<<13,/* Do not add implied html/body... elements */
.
So I expect it to only remove the html, body, ... elements while keeping the rest of the structure.
However, when there are multiple root-level elements in the document, they get nested.
Consider the following C code:
#include <stdio.h>
#include <string.h>
#include <libxml/HTMLtree.h>
#include <libxml/HTMLparser.h>
int main() {
const char* html_content = "<p>foo</p><div>bar</div>";
xmlDocPtr doc = htmlReadMemory(html_content, strlen(html_content), NULL, NULL, HTML_PARSE_NOIMPLIED|HTML_PARSE_NODEFDTD);
htmlDocDump(stdout, doc);
xmlFreeDoc(doc);
return 0;
}
This results in <p>foo<div>bar</div></p>
but I expected <p>foo</p><div>bar</div>
. It's even weirder in this particular example because it's not legal to put a <div>
inside a <p>
.
Is this the intentional behaviour?