(CVE-2022-23308) Heap-use-after-free in xmlValidateRef
The application that validates XML using xmlTextReaderRead() with XML_PARSE_DTDATTR
and XML_PARSE_DTDVALID
enabled becomes vulnerable to this use-after-free bug.
We've already created a proof-of-concept, found root causes and wrote fix patches.
Proof of concept
Here is a proof-of-concept that reproduce crash on xmllint.
My environment is as follows:
$ cat /etc/os-release # Ubuntu
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
$ gcc --version # gcc 9.3
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOS
$ git log -n1 # latest commit
commit 798bdf13f6964a650b9a0b7b4b3a769f6f1d509a (HEAD -> master, origin/master, origin/HEAD)
...
The crafted xml file: poc.xml
To confirm the bug triggers a use-after-free, build xmllint with Address Sanitizer.
CFLAGS="-g -O0 -fsanitize=address" CXXFLAGS="-g -O0 -fsanitize=address" ./configure --without-python --disable-shared
make -j$(nproc)
Run xmllint with the crafted file.
the arguments --valid --dtdattr --stream
are required to cause the use-after-free.
$./xmllint --valid --dtdattr --stream ./poc.xml
./poc.xml:103: validity error : Attribute uaftag of uafattr: invalid default value
<!ATTLIST uaftag uafattr IDREF "&val;">
^
./poc.xml:105: element foo: validity error : root and DTD name do not match 'foo' and 'doc'
<foo>
^
./poc.xml:106: element fillattr: validity error : No declaration for element fillattr
<fillattr/>
^
./poc.xml:107: element uaftag: validity error : No declaration for element uaftag
<uaftag/>
^
./poc.xml:109: element foo: validity error : No declaration for element foo
<ba
^
./poc.xml:109: element bar: validity error : No declaration for attribute fakeattr of element bar
<bar fakeattr="1234567812345678NAMEHERE"/>
^
./poc.xml:109: element bar: validity error : No declaration for element bar
<bar fakeattr="1234567812345678NAMEHERE"/>
^
./poc.xml:110: element foo: validity error : No declaration for element foo
</foo>
^
./poc.xml:111: element bar: validity error : No declaration for element bar
^
=================================================================
==23263==ERROR: AddressSanitizer: heap-use-after-free on address 0x608000003470 at pc 0x5654cd02ecec bp 0x7ffdce162470 sp 0x7ffdce162460
READ of size 4 at 0x608000003470 thread T0
#0 0x5654cd02eceb in xmlValidateRef /home/satoooon/ricsec/cvehunt/latest/valid.c:6553
#1 0x5654cd02f2c3 in xmlWalkValidateList /home/satoooon/ricsec/cvehunt/latest/valid.c:6603
#2 0x5654cd269928 in xmlListWalk /home/satoooon/ricsec/cvehunt/latest/list.c:682
#3 0x5654cd02f400 in xmlValidateCheckRefCallback /home/satoooon/ricsec/cvehunt/latest/valid.c:6625
#4 0x5654ccff35d7 in stubHashScannerFull /home/satoooon/ricsec/cvehunt/latest/hash.c:852
#5 0x5654ccff39bf in xmlHashScanFull /home/satoooon/ricsec/cvehunt/latest/hash.c:899
#6 0x5654ccff36ea in xmlHashScan /home/satoooon/ricsec/cvehunt/latest/hash.c:868
#7 0x5654cd02f5ad in xmlValidateDocumentFinal /home/satoooon/ricsec/cvehunt/latest/valid.c:6673
#8 0x5654cd230ed5 in xmlSAX2EndDocument /home/satoooon/ricsec/cvehunt/latest/SAX2.c:1029
#9 0x5654ccfb5ec8 in xmlParseChunk /home/satoooon/ricsec/cvehunt/latest/parser.c:12402
#10 0x5654cd1d9214 in xmlTextReaderPushData /home/satoooon/ricsec/cvehunt/latest/xmlreader.c:997
#11 0x5654cd1dbaa4 in xmlTextReaderRead /home/satoooon/ricsec/cvehunt/latest/xmlreader.c:1446
#12 0x5654ccf2605c in streamFile /home/satoooon/ricsec/cvehunt/latest/xmllint.c:1923
#13 0x5654ccf305d9 in main /home/satoooon/ricsec/cvehunt/latest/xmllint.c:3748
#14 0x7fedf09170b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
#15 0x5654ccf2020d in _start (/home/satoooon/ricsec/cvehunt/latest/xmllint+0x7a20d)
0x608000003470 is located 80 bytes inside of 96-byte region [0x608000003420,0x608000003480)
freed by thread T0 here:
#0 0x7fedf0d837cf in __interceptor_free (/lib/x86_64-linux-gnu/libasan.so.5+0x10d7cf)
#1 0x5654cd1d5f79 in xmlTextReaderFreeProp /home/satoooon/ricsec/cvehunt/latest/xmlreader.c:380
#2 0x5654cd1d5fd8 in xmlTextReaderFreePropList /home/satoooon/ricsec/cvehunt/latest/xmlreader.c:397
#3 0x5654cd1d6d03 in xmlTextReaderFreeNode /home/satoooon/ricsec/cvehunt/latest/xmlreader.c:540
#4 0x5654cd1dc82f in xmlTextReaderRead /home/satoooon/ricsec/cvehunt/latest/xmlreader.c:1504
#5 0x5654ccf2605c in streamFile /home/satoooon/ricsec/cvehunt/latest/xmllint.c:1923
#6 0x5654ccf305d9 in main /home/satoooon/ricsec/cvehunt/latest/xmllint.c:3748
#7 0x7fedf09170b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
previously allocated by thread T0 here:
#0 0x7fedf0d83bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
#1 0x5654ccfcc084 in xmlNewPropInternal /home/satoooon/ricsec/cvehunt/latest/tree.c:1874
#2 0x5654ccfcc7ed in xmlNewNsPropEatName /home/satoooon/ricsec/cvehunt/latest/tree.c:2013
#3 0x5654cd237d49 in xmlSAX2AttributeNs /home/satoooon/ricsec/cvehunt/latest/SAX2.c:2000
#4 0x5654cd23a83e in xmlSAX2StartElementNs /home/satoooon/ricsec/cvehunt/latest/SAX2.c:2397
#5 0x5654cd1d82da in xmlTextReaderStartElementNs /home/satoooon/ricsec/cvehunt/latest/xmlreader.c:815
#6 0x5654ccf9b410 in xmlParseStartTag2 /home/satoooon/ricsec/cvehunt/latest/parser.c:9658
#7 0x5654ccfad6b0 in xmlParseTryOrFinish /home/satoooon/ricsec/cvehunt/latest/parser.c:11453
#8 0x5654ccfb55ca in xmlParseChunk /home/satoooon/ricsec/cvehunt/latest/parser.c:12346
#9 0x5654cd1d8c0f in xmlTextReaderPushData /home/satoooon/ricsec/cvehunt/latest/xmlreader.c:955
#10 0x5654cd1db364 in xmlTextReaderRead /home/satoooon/ricsec/cvehunt/latest/xmlreader.c:1383
#11 0x5654ccf26020 in streamFile /home/satoooon/ricsec/cvehunt/latest/xmllint.c:1915
#12 0x5654ccf305d9 in main /home/satoooon/ricsec/cvehunt/latest/xmllint.c:3748
#13 0x7fedf09170b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
SUMMARY: AddressSanitizer: heap-use-after-free /home/satoooon/ricsec/cvehunt/latest/valid.c:6553 in xmlValidateRef
Shadow bytes around the buggy address:
0x0c107fff8630: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
0x0c107fff8640: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
0x0c107fff8650: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
0x0c107fff8660: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
0x0c107fff8670: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c107fff8680: fa fa fa fa fd fd fd fd fd fd fd fd fd fd[fd]fd
0x0c107fff8690: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c107fff86a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c107fff86b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c107fff86c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c107fff86d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==23263==ABORTING
Next, to confirm the use-after-free causes a crash, build it without Address Sanitizer.
make distclean
CFLAGS="-g -O0" CXXFLAGS="-g -O0" ./configure --without-python --disable-shared
make -j$(nproc)
Run it again.
$ gdb ./xmllint
...
pwndbg> r --valid --dtdattr --stream ./poc.xml
...
Program received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
65 ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
────────────────────────────────────────────────────────[ REGISTERS ]─────────────────────────────────────────────────────────
RAX 0x7fffffffdb38 —▸ 0x555555720547 ◂— 0x6166007261620061 /* 'a' */
RBX 0x7ffff7ca5c93 (__vfprintf_internal+691) ◂— endbr64
RCX 0xe
RDX 0x45524548454d414e ('NAMEHERE')
RDI 0x45524548454d414e ('NAMEHERE')
RSI 0x7fffffffd768 ◂— 0x5e5518d646f70a00
R8 0xffffffff
R9 0x10
R10 0x5555556c0901 ◂— 's references an unknown ID "%s"\n'
R11 0x45524548454d414e ('NAMEHERE')
R12 0x7fffffffd7c0 ◂— 0xfbad8001
R13 0x5555556c08f0 ◂— 'IDREF attribute %s references an unknown ID "%s"\n'
R14 0x7fffffffda00 ◂— 0x3000000030 /* '0' */
R15 0x73
RBP 0x7fffffffd7a0 —▸ 0x555555741450 ◂— 'IDREF attribute '
RSP 0x7fffffffd228 —▸ 0x7ffff7ca7e95 (__vfprintf_internal+9397) ◂— jmp 0x7ffff7ca620a
RIP 0x7ffff7db7675 (__strlen_avx2+21) ◂— vpcmpeqb ymm1, ymm0, ymmword ptr [rdi]
──────────────────────────────────────────────────────────[ DISASM ]──────────────────────────────────────────────────────────
► 0x7ffff7db7675 <__strlen_avx2+21> vpcmpeqb ymm1, ymm0, ymmword ptr [rdi]
0x7ffff7db7679 <__strlen_avx2+25> vpmovmskb eax, ymm1
0x7ffff7db767d <__strlen_avx2+29> test eax, eax
0x7ffff7db767f <__strlen_avx2+31> jne __strlen_avx2+272 <__strlen_avx2+272>
↓
0x7ffff7db7770 <__strlen_avx2+272> tzcnt eax, eax
0x7ffff7db7774 <__strlen_avx2+276> add rax, rdi
0x7ffff7db7777 <__strlen_avx2+279> sub rax, rdx
0x7ffff7db777a <__strlen_avx2+282> vzeroupper
0x7ffff7db777d <__strlen_avx2+285> ret
0x7ffff7db777e <__strlen_avx2+286> nop
0x7ffff7db7780 <__strlen_avx2+288> tzcnt eax, eax
──────────────────────────────────────────────────────────[ STACK ]───────────────────────────────────────────────────────────
00:0000│ rsp 0x7fffffffd228 —▸ 0x7ffff7ca7e95 (__vfprintf_internal+9397) ◂— jmp 0x7ffff7ca620a
01:0008│ 0x7fffffffd230 ◂— 0x7fff0000001b
02:0010│ 0x7fffffffd238 ◂— 0x3000000030 /* '0' */
03:0018│ 0x7fffffffd240 ◂— 0xffffda10
04:0020│ 0x7fffffffd248 ◂— 0x7fff00000000
05:0028│ 0x7fffffffd250 ◂— 0x0
... ↓
07:0038│ 0x7fffffffd260 ◂— 0x555500000000
────────────────────────────────────────────────────────[ BACKTRACE ]─────────────────────────────────────────────────────────
► f 0 7ffff7db7675 __strlen_avx2+21
f 1 7ffff7ca7e95 __vfprintf_internal+9397
f 2 7ffff7cbb11a __vsnprintf_internal+170
f 3 55555557dfe5 __xmlRaiseError+688
f 4 5555555ca73e xmlErrValidNode+252
f 5 5555555d5f6f xmlValidateRef+553
f 6 5555555d6149 xmlWalkValidateList+58
f 7 5555556b7a88 xmlListWalk+78
f 8 5555555d61b5 xmlValidateCheckRefCallback+101
f 9 5555555bd4e6 stubHashScannerFull+72
f 10 5555555bd640 xmlHashScanFull+242
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Root cause
xmlTextReaderFreeProp() doesn't check return values of xmlTextReaderRemoveID() and xmlTextReaderRemoveRef(). (xmlreader.c:364,368)
/* Check for ID removal -> leading to invalid references ! */
if ((cur->parent != NULL) && (cur->parent->doc != NULL)) {
if (xmlIsID(cur->parent->doc, cur->parent, cur))
xmlTextReaderRemoveID(cur->parent->doc, cur);
if (((cur->parent->doc->intSubset != NULL) ||
(cur->parent->doc->extSubset != NULL)) &&
(xmlIsRef(cur->parent->doc, cur->parent, cur)))
xmlTextReaderRemoveRef(cur->parent->doc, cur);
}
xmlTextReaderRemoveID() and xmlTextReaderRemoveRef() do similar things and have the same vulnerability.
So we will focus on xmlTextReaderRemoveRef(). (xmlreader.c:319)
/**
* xmlTextReaderRemoveRef:
* @doc: the document
* @attr: the attribute
*
* Remove the given attribute from the Ref table maintained internally.
*
* Returns -1 if the lookup failed and 0 otherwise
*/
static int
xmlTextReaderRemoveRef(xmlDocPtr doc, xmlAttrPtr attr) {
xmlListPtr ref_list;
xmlRefTablePtr table;
xmlChar *ID;
if (doc == NULL) return(-1);
if (attr == NULL) return(-1);
table = (xmlRefTablePtr) doc->refs;
if (table == NULL)
return(-1);
ID = xmlNodeListGetString(doc, attr->children, 1);
if (ID == NULL)
return(-1);
ref_list = xmlHashLookup(table, ID);
xmlFree(ID);
if(ref_list == NULL)
return (-1);
xmlListWalk(ref_list, xmlTextReaderWalkRemoveRef, attr);
return(0);
}
xmlTextReaderRemoveRef() searches a hash table(doc->refs) for a xmlRef struct referring to a given xmlAttr struct to be freed later, and removes the reference by xmlTextReaderWalkRemoveRef().
But, if xmlNodeListGetString() or xmlHashLookup() fails, the function exits without calling xmlTextReaderWalkRemoveRef().
Then, xmlTextReaderFreeProp() frees the xmlAttr without removing the reference from the xmlRef.
As a result, xmlValidateRef() refers the freed xmlAttr from the xmlRef and causes the use-after-free.
The same is true for xmlTextReaderRemoveID().
We can intentionally make xmlNodeListGetString() or xmlHashLookup() fail by utilizing either of two different bugs in them. Those bugs can be caused by using entity reference in IDREF declaration.
1. Entity reference that will be normalized later
The following XML causes xmlHashLookup() to fail.
<!ENTITY val "a ">
<!ATTLIST uaftag uafattr IDREF "&val;">
When a xmlRef struct is added to the IDREF(S) hash table in xmlAddRef(), its name is normalized to remove extra space characters. (SAX2.c:2090)
However, when looking up a xmlRef by the name obtained by xmlNodeListGetString() in xmlTextReaderRemoveRef(), the name is not normalized.
Therefore looking up fails by the data inconsistency and xmlHashLookup() returns NULL.
2. Empty entity reference
This bug was found during the analysis, and it isn't used in the proof-of-concept.
The following XML causes that xmlNodeListGetString() returns NULL.
<!ENTITY val "">
<!ATTLIST uaftag uafattr IDREF "&val;">
Nodes that express attribute values are generated in xmlStringLenGetNodeList(). (SAX2.c:2027)
When a attribute value is empty, xmlStringLenGetNodeList() doesn’t consider it as an error, and returns an empty XML_TEXT_NODE. (tree.c:1480)
if (last == NULL) {
ret = node;
} else {
xmlAddNextSibling(last, node);
}
} else if (ret == NULL) {
ret = xmlNewDocText(doc, BAD_CAST "");
}
When processing recursively for entities, xmlStringLenGetNodeList() calls xmlStringGetNodeList().
In contrast to xmlStringLenGetNodeList, xmlStringGetNodeList() considers the empty attribute value as an error.
if (last == NULL) {
ret = node;
} else {
xmlAddNextSibling(last, node);
}
}
This difference causes the bug.
Proposal for revision
I didn't find a way to create a pull request as confidential, so I show patches here (I'm sorry if that way exists).
Empty entity reference bug
Just handle the empty attribute value as xmlStringLenGetNodeList().
Patch: 0001-Fix-empty-string-handling.patch
Normalization bug
I proposal to apply a function like xmlAttrNormalizeSpace() to a return value of xmlStringGetNodeList().
Patch: 0002-Fix-attribute-normalization-problems.patch
In the patch, I copied and pasted xmlAttrNormalizeSpace() from parser.c to xmlreader.c, but I think there are another better ways.
I'm not familiar with libxml2 design, so I would like to ask maintainers to fix it later.
Return value handling
Even with the avobe patch, there is still a open normalization issue. (SAX2.c:1375)
if (xmlStrEqual(fullname, BAD_CAST "xml:id")) {
/*
* Add the xml:id value
*
* Open issue: normalization of the value.
*/
if (xmlValidateNCName(value, 1) != 0) {
xmlErrValid(ctxt, XML_DTD_XMLID_VALUE,
"xml:id : attribute value %s is not an NCName\n",
(const char *) value, NULL);
}
xmlAddID(&ctxt->vctxt, ctxt->myDoc, value, ret);
} else ...
We can make xmlTextReaderRemoveRef/ID() fail with this issue and cause the same use-after-free bug.
Normalization problem is troublesome. I think there are still some undiscovered normalization bugs.
Therefore, we can not trust that xmlTextReaderRemoveRef/ID() will always remove references to attributes successfully and we should handle its return value.
However, xmlTextReaderRemoveRef/ID() failure means that there is no way to remove a reference to xmlAttr that xmlRef/ID has.
For now, I wrote a patch: 0003-Keep-allocating-attribute-in-xmlTextReaderFreeProp.patch
The patch makes xmlTextReaderFreeProp() keep allocating xmlAttr to avoid a use-after-free.
But, if xmlTextReaderRemoveRef/ID() fails, memory leak will be occured instead of use-after-free.
This patch is not perfect, but better.
If we want to handle xmlTextReaderRemoveRef/ID() failure completely, we need to design structures that not occurs both use-after-free and memory leak.
To be honest, I'm not still sure what is the best way in that case.