[CVE-2022-40304] Double-free when parsing default attributes
Test Case
<!DOCTYPE A SYSTEM ""
[
<!ENTITY ENT_A SYSTEM "" NDATA A>
<!ENTITY ENT_B "&ENT_A;&ENT_B;">
<!ATTLIST A C CDATA "">
<!ATTLIST A D CDATA "">
<!ATTLIST A E CDATA "">
<!ATTLIST A F CDATA "">
<!ATTLIST A G CDATA "&ENT_B;">
]>
Reproducing the Issue
Simply run xmllint
on the provided test case.
The bucket used in the hash function is affected by the random seed value used in hash randomization, therefore the test case may need to be run multiple times (by xmllint or any other parser using libxml2) to see the crash due to the system time being used to select the random seed.
Explanation
libxml2
uses a custom hash table implementation throughout its codebase. The hash table maintains a buffer where strings representing dictionary keys are stored allowing for a consistent ownership model. This functionality is exposed in some functions, for example, xmlDictLookup
.
/**
* xmlDictLookup:
* @dict: the dictionary
* @name: the name of the userdata
* @len: the length of the name, if -1 it is recomputed
*
* Add the @name to the dictionary @dict if not present.
*
* Returns the internal copy of the name or NULL in case of internal error
*/
const xmlChar *xmlDictLookup(xmlDictPtr dict, const xmlChar *name, int len) {
There's a problem when handling the nested entities while parsing the last attribute list:
void xmlParseAttributeListDecl(xmlParserCtxtPtr ctxt) {
const xmlChar *elemName;
const xmlChar *attrName;
xmlEnumerationPtr tree;
if (CMP9(CUR_PTR, '<', '!', 'A', 'T', 'T', 'L', 'I', 'S', 'T')) {
// ...
elemName = xmlParseName(ctxt); // [1]
if (elemName == NULL) {
xmlFatalErrMsg(ctxt, XML_ERR_NAME_REQUIRED,
"ATTLIST: no name for Element\n");
return;
}
while ((RAW != '>') && (ctxt->instate != XML_PARSER_EOF)) {
// ...
def = xmlParseDefaultDecl(ctxt, &defaultValue); // [2]
if (def <= 0) {
if (defaultValue != NULL)
xmlFree(defaultValue);
if (tree != NULL)
xmlFreeEnumeration(tree);
break;
}
if ((ctxt->sax2) && (defaultValue != NULL) &&
(def != XML_ATTRIBUTE_IMPLIED) && (def != XML_ATTRIBUTE_REQUIRED)) {
xmlAddDefAttrs(ctxt, elemName, attrName, defaultValue); // [3]
}
// ...
}
When parsing attribute declaration lists as used in the testcase, xmlParseAttributeListDecl
will parse attribute default declarations [2] (e.g. #REQUIRED
, #IMPLIED
, #FIXED
) before adding the defaulted attribute for the specified element [3].
The issue is that xmlParseName
[1] returns into elemName
the internal copy of the string for the parsed name, in this case A
. xmlParseDefaultDecl
[2] will try to resolve the nested entity references for the last ATTLIST
entry in the provided testcase, but due to a reference loop it will hit the depth limit and fail.
When failing, ent->content[0]
is set to 0, meaning the string pointed to by ent->content
is set to the empty string. ent->content
is set earlier in a call to xmlCreateEntity
. The pointer is set to the result of an xmlDictLookup
call. Recall that this returns the internal instance of the A
string.
xmlChar *xmlStringLenDecodeEntities(xmlParserCtxtPtr ctxt, const xmlChar *str,
int len, int what, xmlChar end,
xmlChar end2, xmlChar end3) {
// ...
} else if ((ent != NULL) && (ent->content != NULL)) {
ctxt->depth++;
rep = xmlStringDecodeEntities(ctxt, ent->content, what, 0, 0, 0);
ctxt->depth--;
if (rep == NULL) {
ent->content[0] = 0; // The pointer to the "A" string is cleared.
goto int_error;
}
Therefore we are now in a state where a hash table’s internal string representation for the empty string is duplicated. This leads to issues later on when xmlAddDefAttrs
is called [3].
For reference, you can see that the entity was initialized here and the content pointer was indeed set to an internal string reference.
static xmlEntityPtr xmlCreateEntity(xmlDictPtr dict, const xmlChar *name,
int type, const xmlChar *ExternalID,
const xmlChar *SystemID,
const xmlChar *content) {
// ...
if (content != NULL) {
ret->length = xmlStrlen(content);
if ((dict != NULL) && (ret->length < 5))
ret->content = (xmlChar *)xmlDictLookup(dict, content, ret->length);
Now when xmlAddDefAttrs
is called, elemName
which originally pointed to A
now points to the empty string without its pointer value having been modified. This becomes a memory corruption issue as shown below:
static void xmlAddDefAttrs(xmlParserCtxtPtr ctxt, const xmlChar *fullname,
const xmlChar *fullattr, const xmlChar *value) {
// ...
defaults = xmlHashLookup2(ctxt->attsDefault, name, prefix);
if (defaults == NULL) {
defaults = (xmlDefAttrsPtr)xmlMalloc(sizeof(xmlDefAttrs) +
(4 * 5) * sizeof(const xmlChar *));
if (defaults == NULL)
goto mem_error;
defaults->nbAttrs = 0;
defaults->maxAttrs = 4;
if (xmlHashUpdateEntry2(ctxt->attsDefault, name, prefix, defaults, NULL) <
0) {
xmlFree(defaults);
goto mem_error;
}
} else if (defaults->nbAttrs >= defaults->maxAttrs) {
xmlDefAttrsPtr temp;
temp = (xmlDefAttrsPtr)xmlRealloc(
defaults, sizeof(xmlDefAttrs) +
(2 * defaults->maxAttrs * 5) * sizeof(const xmlChar *));
if (temp == NULL)
goto mem_error;
defaults = temp;
defaults->maxAttrs *= 2;
if (xmlHashUpdateEntry2(ctxt->attsDefault, name, prefix, defaults, NULL) <
0) {
xmlFree(defaults);
goto mem_error;
}
}
xmlHashLookup2
will try to lookup the empty string entry first using the pointer value, and if it fails will use the string comparison function to look for the entry. xmlHashUpdateEntry2
on the other hand will only use the pointer value if the internal dict containing string copies is present. Because there are now two internal strings which contain the empty string, in the lookup case it will match if the lookup happens to land in the correct bucket and in the update case it will never match because the pointers don’t match. The bucket used in the hash function is affected by the random seed value used in hash randomization, therefore the test case may need to be run multiple times (by xmllint
or any other parser using libxml2
) to see the crash.
The existing default attributes list is sometimes found by the first lookup call and never found in the insert call. In the case where the lookup finds the existing default attributes, the realloc
call can free it (with more than 4 attributes as in this testcase) but will not remove the pointer during the update. In the case where the lookup does not find the default attributes, it will not be freed and instead will be leaked when the new allocation is made.
Issues
Clearing ent->content[0]
can break dictionary string semantics.
The behavior of Lookup/Insert is different for dictionaries containing internal dict references which is exacerbated by the corruption of an internal string noted above.
Recommended Fix
Either: (1) avoid modifying ent->content
by looking up the empty string in the dict if needed instead of modifying the existing value or (2) make the semantics consistent between lookup and insert. I believe (1) is a more fundamental fix for the problem as the lookup/insert semantics aren’t inherently a problem.
Additionally, we recommend fuzzing with the hash list randomization on but using a non time-based seed drawn from the test case.
Crash Log
Line numbers in this testcase may not match your version as I applied clang-format before my review/fuzzing efforts.
==675853==ERROR: AddressSanitizer: attempting double-free on 0x60f000000130 in thread T0:
#0 0x55623faa2117 in __interceptor_free.part.0 llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3
#1 0x55623fb44467 in xmlHashDefaultDeallocator libxml2/hash.c:375:3
#2 0x55623fb44026 in xmlHashFree libxml2/hash.c:341:11
#3 0x55623fc063ed in xmlFreeParserCtxt libxml2/parserInternals.c:1654:5
#4 0x55623fbed12b in xmlDoRead libxml2/parser.c:14847:5
#5 0x55623fbed227 in xmlReadFile libxml2/parser.c:14895:11
#6 0x55623faf1f8d in parseAndPrintFile libxml2/xmllint.c:2258:19
#7 0x55623faea403 in main libxml2/xmllint.c:3647:13
#8 0x7fe8af1ee7fc in __libc_start_main csu/../csu/libc-start.c:332:16
0x60f000000130 is located 0 bytes inside of 168-byte region [0x60f000000130,0x60f0000001d8)
freed by thread T0 here:
#0 0x55623faa223f in __interceptor_realloc.part.0 llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:85:3
#1 0x55623fbb2107 in xmlAddDefAttrs libxml2/parser.c:1243:28
#2 0x55623fbb1511 in xmlParseAttributeListDecl libxml2/parser.c:5917:9
#3 0x55623fbb7b37 in xmlParseMarkupDecl libxml2/parser.c:6731:9
#4 0x55623fbd5a87 in xmlParseInternalSubset libxml2/parser.c:8184:7
#5 0x55623fbd42ac in xmlParseDocument libxml2/parser.c:10575:7
#6 0x55623fbed067 in xmlDoRead libxml2/parser.c:14836:3
#7 0x55623fbed227 in xmlReadFile libxml2/parser.c:14895:11
#8 0x55623faf1f8d in parseAndPrintFile libxml2/xmllint.c:2258:19
#9 0x55623faea403 in main libxml2/xmllint.c:3647:13
#10 0x7fe8af1ee7fc in __libc_start_main csu/../csu/libc-start.c:332:16
previously allocated by thread T0 here:
#0 0x55623faa32df in __interceptor_malloc llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:69:3
#1 0x55623fbb2016 in xmlAddDefAttrs libxml2/parser.c:1228:32
#2 0x55623fbb1511 in xmlParseAttributeListDecl libxml2/parser.c:5917:9
#3 0x55623fbb7b37 in xmlParseMarkupDecl libxml2/parser.c:6731:9
#4 0x55623fbd5a87 in xmlParseInternalSubset libxml2/parser.c:8184:7
#5 0x55623fbd42ac in xmlParseDocument libxml2/parser.c:10575:7
#6 0x55623fbed067 in xmlDoRead libxml2/parser.c:14836:3
#7 0x55623fbed227 in xmlReadFile libxml2/parser.c:14895:11
#8 0x55623faf1f8d in parseAndPrintFile libxml2/xmllint.c:2258:19
#9 0x55623faea403 in main libxml2/xmllint.c:3647:13
#10 0x7fe8af1ee7fc in __libc_start_main csu/../csu/libc-start.c:332:16
SUMMARY: AddressSanitizer: double-free llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3 in __interceptor_free.part.0