(on master) large CDATA sections result in infinite loop
As of c8469863, the behavior of the SAX parser when encountering large CDATA sections was to raise an error "CData section too big found", let the cdata be consumed, and then terminate parsing.
But as of commit 3582b07b, the behavior has changed so that the characters never get consumed.
Given input XML like this:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<![CDATA[%s]]>
</root>
where the %s
is a string larger than XML_MAX_TEXT_LENGTH
(you can generate this via a script if you like:
#! /usr/bin/env ruby
require "stringio"
template = <<~XML
<?xml version="1.0" encoding="UTF-8"?>
<root>
<![CDATA[%s]]>
</root>
XML
factor = 10
huge_data = "a" * (1024 * 1024 * factor)
xml = StringIO.new(template % huge_data)
File.open("foo.xml", "w") { |f| f.write xml.string }
)
Previous xmllint
behavior was:
$ ./xmllint --sax foo.xml
SAX.setDocumentLocator()
SAX.startDocument()
SAX.startElementNs(root, NULL, NULL, 0, 0, 0)
SAX.characters(
, 3)
SAX.error: CData section too big found
SAX.characters(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, 3940)
SAX.error: internal error: detected an error in element content
SAX.endDocument()
But after 3582b07b, the behavior is:
$ ./xmllint --sax foo.xml
SAX.setDocumentLocator()
SAX.startDocument()
SAX.startElementNs(root, NULL, NULL, 0, 0, 0)
SAX.characters(
, 3)
SAX.error: CData section too big found
SAX.characters(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, 3940)
SAX.characters(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, 3940)
SAX.characters(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, 3940)
SAX.characters(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, 3940)
SAX.characters(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, 3940)
SAX.characters(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, 3940)
SAX.characters(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, 3940)
SAX.characters(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, 3940)
SAX.characters(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, 3940)
... ad infinitum ...
This bug was discovered by the Nokogiri rubygem project, which runs a test pipeline against upstream libxml2 several times a week.
Edited by Mike Dalessio