Running a second transformation on an input document: first transformation has modified document
This may be a code bug or a documentation bug: ie, if this is ‘works as designed’, then that may need to be more explicit.
Consider the program below:
// The first unexpected case -- transform input.xml with identity-nospace.xslt
// _and_ with identity.xslt. The same tree is transformed with two
// different XSLT stylesheets, but is _changed_ by the presence of the
// <xsl:strip-space/> element in the first transform.
#include <libxml/parser.h>
#include <libxslt/transform.h>
#include <libxslt/xsltutils.h>
int main(int argc, char** argv)
{
printf("libxml: %s\nlibxslt: %d\n",
LIBXML_DOTTED_VERSION,
xsltLibxsltVersion);
const xmlChar* transform_fn1 = (xmlChar*)"identity-nospace.xslt";
const xmlChar* transform_fn2 = (xmlChar*)"identity.xslt";
const char* input_fn = "input.xml";
xmlSubstituteEntitiesDefault(1);
xmlLoadExtDtdDefaultValue = 1;
xmlDocPtr input = xmlParseFile(input_fn);
// transform the input document with transform 1,
// which includes a <xsl:strip-space/> element
xsltStylesheetPtr xslt1 = xsltParseStylesheetFile(transform_fn1);
xmlDocPtr result1 = xsltApplyStylesheet(xslt1, input, NULL);
xsltSaveResultToFile(stdout, result1, xslt1);
// now RE-transform it with a different stylesheet,
// which doesn't include <xsl:strip-space/>
xsltStylesheetPtr xslt2 = xsltParseStylesheetFile(transform_fn2);
xmlDocPtr result2 = xsltApplyStylesheet(xslt2, input, NULL);
xsltSaveResultToFile(stdout, result2, xslt2);
xsltFreeStylesheet(xslt1);
xsltFreeStylesheet(xslt2);
xmlFreeDoc(result1);
xmlFreeDoc(result2);
xmlFreeDoc(input);
xsltCleanupGlobals();
xmlCleanupParser();
exit(0);
}
This transforms an input document twice, using the identity transform first with, and then without, <strip-space elements='*'/>
. The output I get is
libxml: 2.9.13
libxslt: 10135
<?xml version="1.0"?>
<a><b><c>Hello.</c></b></a>
<?xml version="1.0"?>
<a><b><c>Hello.</c></b></a>
(see the attachment for the mentioned files: retransform.tar.gz)
I would expect the second output to have the whitespace between elements that is present in the input document, and this works as expected if the two transforms are done in the opposite order (ie, swapping transform_fn1
and transform_fn2
).
That is, it mostly works, except that the second transform is working on a space-stripped input document. This is much the same as issue #14 (but without the potentially confounding factor of the wrapping Python layer). Specifically, the first transformation has removed the space elements (see also issue #54 (closed)) rather than merely skipping them during the transformation.
Now, as I say, it may be that this is the designed behaviour. But if so, that would be surprising, to me at least and (cf issue #14) to others, since this violates the natural mental model of what's happening in the library. Consequently, it would seem important to highlight that this is happening, but I can't see any warning of this in the FAQ, nor in the documentation of the xsltApplyStylesheet function.
I notice that in the Reader overview, there is a passing remark that ‘the XmlTextReader API is a forward only tree walking interface.’ I'm not sure if that's intended to mean that the tree can be walked/transformed only once, but if so, it's a very oblique way of saying so, and it might be worth emphasising this nearer to xsltApplyStylesheet
.