Return of xmlDocDumpFormatMemoryEnc is truncated in libxml2 2.9.10
I'm using xmlDocDumpFormatMemoryEnc to dump a DOM tree value with string length more than 5000 chars. The return content is truncated. The problem only exists in 2.9.10 and it's ok in 2.9.9.
Note : this testcase need to be compiled with LIBXML_ICONV_ENABLED on. Code to reproduce the problem :
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "libxml/tree.h"
#include "libxml/parser.h"
void main()
{
int iRet;
xmlDocPtr doc;
xmlChar *pxmlout;
int xmllen;
const char *filename = (const char*)"/home/test/test.xml";
if(NULL==(doc=xmlParseFile(filename)))
{
printf("xmlParseFile ERROR.\n");
return;
}
xmlDocDumpFormatMemoryEnc(doc,&pxmlout,&xmllen,"GBK",1);
printf("xmllen=[%d],xmlout=[%s]\n",xmllen,pxmlout);
xmlFree(pxmlout);
return;
}
I'm compiling the testcase with command:
gcc -o testcase testcase.c -g -DLIBXML_ICONV_ENABLED -I/home/libxml2/include/ -l/home/libxml2/libxml2.so
And the test.xml is also attached.
I digged a little in the code and find the problem is here:
The xmlOutputBufferWriteEscape
in xmlIO.c
has been updated a little from 2.9.9
/*
* convert as much as possible to the output buffer.
*/
ret = xmlCharEncOutput(out, 0);
if ((ret < 0) && (ret != -3)) {
xmlIOErr(XML_IO_ENCODER, NULL);
out->error = XML_IO_ENCODER;
return(-1);
}
nbchars = xmlBufUse(out->conv);
to 2.9.10
/*
* convert as much as possible to the output buffer.
*/
ret = xmlCharEncOutput(out, 0);
if ((ret < 0) && (ret != -3)) {
xmlIOErr(XML_IO_ENCODER, NULL);
out->error = XML_IO_ENCODER;
return(-1);
}
if (out->writecallback)
nbchars = xmlBufUse(out->conv);
else
nbchars = ret;
The nbchars
may be the ret
now, while the ret
is the return value of xmlEncOutputChunk
in encoding.c
. The ret
can be a positive value if handler->output
is called, but it can only be 0 or negative value if xmlIconvWrapper
or xmlUconvWrapper
is called.
My fix code is here:
/* Returns -4 if no output function was found. */
static int
xmlEncOutputChunk(xmlCharEncodingHandler *handler, unsigned char *out,
int *outlen, const unsigned char *in, int *inlen) {
int ret;
if (handler->output != NULL) {
ret = handler->output(out, outlen, in, inlen);
}
#ifdef LIBXML_ICONV_ENABLED
else if (handler->iconv_out != NULL) {
ret = xmlIconvWrapper(handler->iconv_out, out, outlen, in, inlen);
// assign the outlen value to ret if it's positive
if (ret >= 0) {
ret = *outlen
}
}
#endif /* LIBXML_ICONV_ENABLED */
#ifdef LIBXML_ICU_ENABLED
else if (handler->uconv_out != NULL) {
ret = xmlUconvWrapper(handler->uconv_out, 0, out, outlen, in, inlen,
TRUE);
// assign the outlen value to ret if it's positive
if (ret >= 0) {
ret = *outlen
}
}
#endif /* LIBXML_ICU_ENABLED */
else {
*outlen = 0;
*inlen = 0;
ret = -4;
}
return(ret);
}
I'm fixing this problem by assigning the outlen
to the ret
. This works fine for me.I'm not sure if this is a good fix.