Empty namespace definition leaks after parsing incorrect input
I was made aware of a data leak between parsed documents in lxml doing the following:
- parse the incomplete document
<anot xmlns="1">
, get a parser error (as expected) - reuse the same parser (i.e. the same libxml2 parser context, after resetting it) to parse the correct document
<root></root>
- access the supposedly empty namespace mapping of the parsed (second) root element
The expected result is an empty dictionary ({}
), but if (and only if) the previous incorrect document has been parsed before, then the result is the (senseless) {None: None}
, meaning that it got an nsDef
with two NULL
values from somewhere.
Apparently, the parser failure in the first run had already prepared the namespace information when the parsing failed, but then that did not get cleaned up with the normal reset of the parser context, so that the second parse passed it on to the second document.
I can reproduce this with libxml2 2.9.14 and 2.9.10, but not with 2.9.9.
Edited by scoder