Huge node error not handled
Recently, when using my application, I encountered a very long validation of relatively small documents. If the document is validated with xmllint, the time is much less and the error "xmlSAX2Characters: huge text node" is displayed. As far as I could figure it out: Unicode control characters come in the document, which somehow force us to consider everything after them as a single node. I am using xmlSchemaValidateStream. There is a difference in how the text is processed in this function and in the reader from xmllint. The reader uses xmlSAX2Text in which the length of the node is checked and if it is too large, an error is thrown. In xmlSchemaValidateStream, xmlSchemaSAXHandleText is used, in which nothing of the kind happens, and xmlStrcat is constantly called, where we constantly re-allocate and copy memory, which is why the process is delayed. Unfortunately, the processing method from xmllint does not suit me. Because my application needs: -validation of the xml document according to the scheme -SAX parse -possibility to use custom handlers, for example, when finding an opening tag I haven't found a way to pass custom handlers to the reader.
Right now I'm using workaround. When calling custom characters handler, I access the "private" fields in xmlSchemaValidCtxt and xmlSchemaNodeInfo where I access the contents of the current node(inode) and check its xmlStrlen, and if the length is greater than a certain value, I brake the parser.
Obviously this is a bad decision. I hope you can give better advice. It is quite possible you could tell me how to protect it from control characters. Or tell me how to throw custom handlers into the reader. Or add handling for a similar situation when using xmlSchemaValidateStream. Or something else.
By the way, to stop the parser and exit it with an error, I use a similar code: static void errParserStop(xmlSchemaValidCtxtPtr ctx, int errCode) { xmlParserCtxtPtr pctx = xmlSchemaValidCtxtGetParserCtxt(ctx); pctx->errNo = errCode; pctx->instate = -1; pctx->disableSAX = 1; } It is quite possible that you can also suggest a better way.