Change in entity parsing behavior in v2.13.5
Sorry for the vague description, but I don't know how else to describe what I'm seeing. The issue was originally reported as a segfault in the libdom package on Gentoo, at https://bugs.gentoo.org/946980
libdom is the DOM library used by the Netsurf browser. On Gentoo, we build it with the libxml2 backend. Basically what libdom does is use libxml2 to parse a document and create an XML tree. libdom then creates its own tree made up of its own custom node/element/attr etc. types, each of which is linked to the corresponding node/element/attr in the libxml tree. In essence, it wraps the libxml types with some more information.
So far so good. But libdom's test suite begins to fail when we upgrade from libxml2-2.12.9 to 2.13.5. The problem in libdom is that we wind up with a NULL
where it isn't expected. One example of this is in the test suite, parsing staff.xml. The third employee, Roger Jones, has a gender that consists only of an entity reference:
<!ENTITY ent4 "<entElement domestic='Yes'>Element data</entElement><?PItarget PIdata?>">
...
<gender>&ent4;</gender>
With libxml2-2.12.9, parsing that works "as expected." The <entElement>
is added as a child of the <gender>
element. In libxml2-2.13.5, instead what I see is that the <entElement>
is added with the document root as its parent. (This is what leads to the segfault in libdom.)
To determine this, I am just using printf statements at the beginning of the .startElementNs
SAX handler.
Any insight you may have would be greatly appreciated.