Nonrecursive transformations in recent libxml trunk cause null pointer dereferences
I only have valgrind dumps at the moment, not full backtraces, but the recent changes to (at least) xmlsave.c:xmlNodeDumpOutput and HTMLtree.c:htmlNodeDumpFormatOutput to make them nonrecursive provoke coredumps from Python 2.7's lxml (observed when my routine weekly grab of the Economist from Calibre failed).
It's not instantly obvious to me why a null is turning up, but it is clear that the original checked for null at the top of the function while the replacements are frequently only checking that the current node isn't the root in the corresponding places. It seems to me that we should always be climbing down and then back up again and always reaching the root, but if peer nodes are somehow unchained from the root we would see crashes in the new code that we don't see in the old.
Examples (line numbers are current git trunk as of today) are obviously uses of NULL nodes:
1% Generating masthead...
Synthesizing mastheadImage
==449516== Invalid read of size 4
==449516== at 0xEAFFE9D: xmlNodeDumpOutputInternal (xmlsave.c:1063)
==449516== by 0xEB00B73: xmlNodeDumpOutput__internal_alias (xmlsave.c:2320)
==449516== by 0xEB00B73: xmlNodeDumpOutput (xmlsave.c:2284)
==449516== by 0xE7F4B95: __pyx_f_4lxml_5etree__writeNodeToBuffer (etree.c:140385)
==449516== by 0xE8E75E2: __pyx_f_4lxml_5etree__tostring (etree.c:139103)
==449516== by 0xE92EB25: __pyx_pf_4lxml_5etree_32tostring (etree.c:85657)
==449516== by 0xE92EB25: __pyx_pw_4lxml_5etree_33tostring (etree.c:84950)
==449516== by 0x4EA49C1: PyObject_Call (abstract.c:2544)
==449516== by 0x4E87528: UnknownInlinedFun (ceval.c:4593)
==449516== by 0x4E87528: UnknownInlinedFun (ceval.c:4398)
==449516== by 0x4E87528: PyEval_EvalFrameEx (ceval.c:3013)
==449516== by 0x4EB4185: PyEval_EvalCodeEx (ceval.c:3608)
==449516== by 0x4E890DB: UnknownInlinedFun (ceval.c:4471)
==449516== by 0x4E890DB: UnknownInlinedFun (ceval.c:4396)
==449516== by 0x4E890DB: PyEval_EvalFrameEx (ceval.c:3013)
==449516== by 0x4EB4185: PyEval_EvalCodeEx (ceval.c:3608)
==449516== by 0x4E890DB: UnknownInlinedFun (ceval.c:4471)
==449516== by 0x4E890DB: UnknownInlinedFun (ceval.c:4396)
==449516== by 0x4E890DB: PyEval_EvalFrameEx (ceval.c:3013)
==449516== by 0x4EB4185: PyEval_EvalCodeEx (ceval.c:3608)
==449516== Address 0x8 is not stack'd, malloc'd or (recently) free'd
... and if you guard against NULL here, the corresponding HTML bug emerges:
1% Starting download in a single thread...
==477795== Thread 19:
==477795== Invalid read of size 4
==477795== at 0xEA7114D: htmlNodeDumpFormatOutput__internal_alias.part.0 (HTMLtree.c:908)
==477795== by 0xE7F4C96: __pyx_f_4lxml_5etree__writeNodeToBuffer (etree.c:140357)
==477795== by 0xE8E75E2: __pyx_f_4lxml_5etree__tostring (etree.c:139103)
==477795== by 0xE92EB25: __pyx_pf_4lxml_5etree_32tostring (etree.c:85657)
==477795== by 0xE92EB25: __pyx_pw_4lxml_5etree_33tostring (etree.c:84950)
==477795== by 0x4EA49C1: PyObject_Call (abstract.c:2544)
==477795== by 0x4E87528: UnknownInlinedFun (ceval.c:4593)
==477795== by 0x4E87528: UnknownInlinedFun (ceval.c:4398)
==477795== by 0x4E87528: PyEval_EvalFrameEx (ceval.c:3013)
==477795== by 0x4EB4185: PyEval_EvalCodeEx (ceval.c:3608)
==477795== by 0x4E8812F: UnknownInlinedFun (ceval.c:4471)
==477795== by 0x4E8812F: UnknownInlinedFun (ceval.c:4396)
==477795== by 0x4E8812F: PyEval_EvalFrameEx (ceval.c:3013)
==477795== by 0x4EB4185: PyEval_EvalCodeEx (ceval.c:3608)
==477795== by 0x4E890DB: UnknownInlinedFun (ceval.c:4471)
==477795== by 0x4E890DB: UnknownInlinedFun (ceval.c:4396)
==477795== by 0x4E890DB: PyEval_EvalFrameEx (ceval.c:3013)
==477795== by 0x4EB4185: PyEval_EvalCodeEx (ceval.c:3608)
==477795== by 0x4E890DB: UnknownInlinedFun (ceval.c:4471)
==477795== by 0x4E890DB: UnknownInlinedFun (ceval.c:4396)
==477795== by 0x4E890DB: PyEval_EvalFrameEx (ceval.c:3013)
==477795== Address 0x8 is not stack'd, malloc'd or (recently) free'd