more lenient html parsing: can libxml return "<(" literally?
Emacs parses html with libxml using
doc = htmlReadMemory ((char *)buftext,
iend_byte - istart_byte, burl, "utf-8",
HTML_PARSE_RECOVER|HTML_PARSE_NONET|
HTML_PARSE_NOWARNING|HTML_PARSE_NOERROR|
HTML_PARSE_NOBLANKS);
When given <pre>foo< </pre>bar
this gives a pre containing the string "foo< "
– the author should've done <
but <
can't be a tag so happily libxml is lenient and just returns the literal symbols.
Would it be possible to do this for more such cannot-be-tag-usages of <
, e.g. <(
? Currently, <pre>foo<(</pre>bar
gives a pre containing the string "foobar"
, ie. it both removes the <(
and doesn't notice the end-of-pre :-(