New Security Issue: use-after-free in libxml2
Original reporter: Shinji Sato
Area: Platform component
Message
Hello, my name is Shinji Sato, working at a Japanese security company.
Several days ago, we found a use-after-free bug that causes denial-of-service on libxml2.
The application that validates XML using xmlTextReaderRead() with XML_PARSE_DTDATTR
and XML_PARSE_DTDVALID
enabled becomes vulnerable to this bug.
We've already created a proof-of-concept, found root causes and wrote fix patches.
The details are as follows.
PoC
Here is a proof-of-concept that reproduce crash on xmllint.
My environment is as follows:
$ cat /etc/os-release # Ubuntu
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
$ gcc --version # gcc 9.3
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOS
$ git log -n1 # latest commit
commit dea91c97debeac7c1aaf9c19f79029809e23a353 (HEAD -> master, origin/master, origin/HEAD)
...
The crafted xml file is as follows:
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
To confirm the bug triggers a use-after-free, build xmllint with Address Sanitizer.
CFLAGS="-g -O0 -fsanitize=address" CXXFLAGS="-g -O0 -fsanitize=address" ./configure --without-python --disable-shared
make -j$(nproc)
Run xmllint with the crafted file.
the arguments --valid --dtdattr --stream
are required to cause the use-after-free.
$ ./xmllint --valid --dtdattr --stream ./poc.xml
...
=================================================================
==18994==ERROR: AddressSanitizer: heap-use-after-free on address 0x608000003470 at pc 0x55f2481eecec bp 0x7ffdf02f1160 sp 0x7ffdf02f1150
READ of size 4 at 0x608000003470 thread T0
#0 0x55f2481eeceb in xmlValidateRef /home/satoooon/ricsec/cvehunt/libxml2/valid.c:6553
#1 0x55f2481ef2c3 in xmlWalkValidateList /home/satoooon/ricsec/cvehunt/libxml2/valid.c:6603
#2 0x55f248429f66 in xmlListWalk /home/satoooon/ricsec/cvehunt/libxml2/list.c:682
#3 0x55f2481ef400 in xmlValidateCheckRefCallback /home/satoooon/ricsec/cvehunt/libxml2/valid.c:6625
#4 0x55f2481b35d7 in stubHashScannerFull /home/satoooon/ricsec/cvehunt/libxml2/hash.c:852
#5 0x55f2481b39bf in xmlHashScanFull /home/satoooon/ricsec/cvehunt/libxml2/hash.c:899
#6 0x55f2481b36ea in xmlHashScan /home/satoooon/ricsec/cvehunt/libxml2/hash.c:868
#7 0x55f2481ef5ad in xmlValidateDocumentFinal /home/satoooon/ricsec/cvehunt/libxml2/valid.c:6673
#8 0x55f2483f1513 in xmlSAX2EndDocument /home/satoooon/ricsec/cvehunt/libxml2/SAX2.c:1029
#9 0x55f248175ec8 in xmlParseChunk /home/satoooon/ricsec/cvehunt/libxml2/parser.c:12402
#10 0x55f248399852 in xmlTextReaderPushData /home/satoooon/ricsec/cvehunt/libxml2/xmlreader.c:997
#11 0x55f24839c0e2 in xmlTextReaderRead /home/satoooon/ricsec/cvehunt/libxml2/xmlreader.c:1446
#12 0x55f2480e605c in streamFile /home/satoooon/ricsec/cvehunt/libxml2/xmllint.c:1923
#13 0x55f2480f05d9 in main /home/satoooon/ricsec/cvehunt/libxml2/xmllint.c:3748
#14 0x7f24936460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
#15 0x55f2480e020d in _start (/home/satoooon/ricsec/cvehunt/libxml2/xmllint+0x7a20d)
0x608000003470 is located 80 bytes inside of 96-byte region [0x608000003420,0x608000003480)
freed by thread T0 here:
#0 0x7f2493ab27cf in __interceptor_free (/lib/x86_64-linux-gnu/libasan.so.5+0x10d7cf)
#1 0x55f2483965b7 in xmlTextReaderFreeProp /home/satoooon/ricsec/cvehunt/libxml2/xmlreader.c:380
#2 0x55f248396616 in xmlTextReaderFreePropList /home/satoooon/ricsec/cvehunt/libxml2/xmlreader.c:397
#3 0x55f248397341 in xmlTextReaderFreeNode /home/satoooon/ricsec/cvehunt/libxml2/xmlreader.c:540
#4 0x55f24839ce6d in xmlTextReaderRead /home/satoooon/ricsec/cvehunt/libxml2/xmlreader.c:1504
#5 0x55f2480e605c in streamFile /home/satoooon/ricsec/cvehunt/libxml2/xmllint.c:1923
#6 0x55f2480f05d9 in main /home/satoooon/ricsec/cvehunt/libxml2/xmllint.c:3748
#7 0x7f24936460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
previously allocated by thread T0 here:
#0 0x7f2493ab2bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
#1 0x55f24818c084 in xmlNewPropInternal /home/satoooon/ricsec/cvehunt/libxml2/tree.c:1874
#2 0x55f24818c7ed in xmlNewNsPropEatName /home/satoooon/ricsec/cvehunt/libxml2/tree.c:2013
#3 0x55f2483f8387 in xmlSAX2AttributeNs /home/satoooon/ricsec/cvehunt/libxml2/SAX2.c:2000
#4 0x55f2483fae7c in xmlSAX2StartElementNs /home/satoooon/ricsec/cvehunt/libxml2/SAX2.c:2397
#5 0x55f248398918 in xmlTextReaderStartElementNs /home/satoooon/ricsec/cvehunt/libxml2/xmlreader.c:815
#6 0x55f24815b410 in xmlParseStartTag2 /home/satoooon/ricsec/cvehunt/libxml2/parser.c:9658
#7 0x55f24816d6b0 in xmlParseTryOrFinish /home/satoooon/ricsec/cvehunt/libxml2/parser.c:11453
#8 0x55f2481755ca in xmlParseChunk /home/satoooon/ricsec/cvehunt/libxml2/parser.c:12346
#9 0x55f24839924d in xmlTextReaderPushData /home/satoooon/ricsec/cvehunt/libxml2/xmlreader.c:955
#10 0x55f24839b9a2 in xmlTextReaderRead /home/satoooon/ricsec/cvehunt/libxml2/xmlreader.c:1383
#11 0x55f2480e6020 in streamFile /home/satoooon/ricsec/cvehunt/libxml2/xmllint.c:1915
#12 0x55f2480f05d9 in main /home/satoooon/ricsec/cvehunt/libxml2/xmllint.c:3748
#13 0x7f24936460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
SUMMARY: AddressSanitizer: heap-use-after-free /home/satoooon/ricsec/cvehunt/libxml2/valid.c:6553 in xmlValidateRef
Shadow bytes around the buggy address:
0x0c107fff8630: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
0x0c107fff8640: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
0x0c107fff8650: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
0x0c107fff8660: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
0x0c107fff8670: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c107fff8680: fa fa fa fa fd fd fd fd fd fd fd fd fd fd[fd]fd
0x0c107fff8690: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c107fff86a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c107fff86b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c107fff86c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c107fff86d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==18994==ABORTING
Next, to confirm the use-after-free causes a crash, build it without Address Sanitizer.
make distclean
CFLAGS="-g -O0" CXXFLAGS="-g -O0" ./configure --without-python --disable-shared
make -j$(nproc)
Run it again.
$ gdb ./xmllint
...
pwndbg> r --valid --dtdattr --stream ./poc.xml
...
Program received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
65 ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
────────────────────────────────────────────────────────[ REGISTERS ]─────────────────────────────────────────────────────────
RAX 0x7fffffffdb38 —▸ 0x555555720547 ◂— 0x6166007261620061 /* 'a' */
RBX 0x7ffff7ca5c93 (__vfprintf_internal+691) ◂— endbr64
RCX 0xe
RDX 0x45524548454d414e ('NAMEHERE')
RDI 0x45524548454d414e ('NAMEHERE')
RSI 0x7fffffffd768 ◂— 0x5e5518d646f70a00
R8 0xffffffff
R9 0x10
R10 0x5555556c0901 ◂— 's references an unknown ID "%s"\n'
R11 0x45524548454d414e ('NAMEHERE')
R12 0x7fffffffd7c0 ◂— 0xfbad8001
R13 0x5555556c08f0 ◂— 'IDREF attribute %s references an unknown ID "%s"\n'
R14 0x7fffffffda00 ◂— 0x3000000030 /* '0' */
R15 0x73
RBP 0x7fffffffd7a0 —▸ 0x555555741450 ◂— 'IDREF attribute '
RSP 0x7fffffffd228 —▸ 0x7ffff7ca7e95 (__vfprintf_internal+9397) ◂— jmp 0x7ffff7ca620a
RIP 0x7ffff7db7675 (__strlen_avx2+21) ◂— vpcmpeqb ymm1, ymm0, ymmword ptr [rdi]
──────────────────────────────────────────────────────────[ DISASM ]──────────────────────────────────────────────────────────
► 0x7ffff7db7675 vpcmpeqb ymm1, ymm0, ymmword ptr [rdi]
0x7ffff7db7679 vpmovmskb eax, ymm1
0x7ffff7db767d test eax, eax
0x7ffff7db767f jne __strlen_avx2+272
↓
0x7ffff7db7770 tzcnt eax, eax
0x7ffff7db7774 add rax, rdi
0x7ffff7db7777 sub rax, rdx
0x7ffff7db777a vzeroupper
0x7ffff7db777d ret
0x7ffff7db777e nop
0x7ffff7db7780 tzcnt eax, eax
──────────────────────────────────────────────────────────[ STACK ]───────────────────────────────────────────────────────────
00:0000│ rsp 0x7fffffffd228 —▸ 0x7ffff7ca7e95 (__vfprintf_internal+9397) ◂— jmp 0x7ffff7ca620a
01:0008│ 0x7fffffffd230 ◂— 0x7fff0000001b
02:0010│ 0x7fffffffd238 ◂— 0x3000000030 /* '0' */
03:0018│ 0x7fffffffd240 ◂— 0xffffda10
04:0020│ 0x7fffffffd248 ◂— 0x7fff00000000
05:0028│ 0x7fffffffd250 ◂— 0x0
... ↓
07:0038│ 0x7fffffffd260 ◂— 0x555500000000
────────────────────────────────────────────────────────[ BACKTRACE ]─────────────────────────────────────────────────────────
► f 0 7ffff7db7675 __strlen_avx2+21
f 1 7ffff7ca7e95 __vfprintf_internal+9397
f 2 7ffff7cbb11a __vsnprintf_internal+170
f 3 55555557dfe5 __xmlRaiseError+688
f 4 5555555ca73e xmlErrValidNode+252
f 5 5555555d5f6f xmlValidateRef+553
f 6 5555555d6149 xmlWalkValidateList+58
f 7 5555556b7a88 xmlListWalk+78
f 8 5555555d61b5 xmlValidateCheckRefCallback+101
f 9 5555555bd4e6 stubHashScannerFull+72
f 10 5555555bd640 xmlHashScanFull+242
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Root cause
xmlTextReaderFreeProp() doesn't check return values of xmlTextReaderRemoveID() and xmlTextReaderRemoveRef(). (xmlreader.c:364,368)
/* Check for ID removal -> leading to invalid references ! */
if ((cur->parent != NULL) && (cur->parent->doc != NULL)) {
if (xmlIsID(cur->parent->doc, cur->parent, cur))
xmlTextReaderRemoveID(cur->parent->doc, cur);
if (((cur->parent->doc->intSubset != NULL) ||
(cur->parent->doc->extSubset != NULL)) &&
(xmlIsRef(cur->parent->doc, cur->parent, cur)))
xmlTextReaderRemoveRef(cur->parent->doc, cur);
}
xmlTextReaderRemoveID() and xmlTextReaderRemoveRef() do similar things and have the same vulnerability. So we will focus on xmlTextReaderRemoveRef(). (xmlreader.c:319)
/**
* xmlTextReaderRemoveRef:
* @doc: the document
* @attr: the attribute
*
* Remove the given attribute from the Ref table maintained internally.
*
* Returns -1 if the lookup failed and 0 otherwise
*/
static int
xmlTextReaderRemoveRef(xmlDocPtr doc, xmlAttrPtr attr) {
xmlListPtr ref_list;
xmlRefTablePtr table;
xmlChar *ID;
if (doc == NULL) return(-1);
if (attr == NULL) return(-1);
table = (xmlRefTablePtr) doc->refs;
if (table == NULL)
return(-1);
ID = xmlNodeListGetString(doc, attr->children, 1);
if (ID == NULL)
return(-1);
ref_list = xmlHashLookup(table, ID);
xmlFree(ID);
if(ref_list == NULL)
return (-1);
xmlListWalk(ref_list, xmlTextReaderWalkRemoveRef, attr);
return(0);
}
xmlTextReaderRemoveRef() searches a hash table(doc->refs) for a xmlRef struct referring to a given xmlAttr struct to be freed later, and removes the reference by xmlTextReaderWalkRemoveRef(). But, if xmlNodeListGetString() or xmlHashLookup() fails, the function exits without calling xmlTextReaderWalkRemoveRef(). Then, xmlTextReaderFreeProp() frees the xmlAttr without removing the reference from the xmlRef. As a result, xmlValidateRef() refers the freed xmlAttr from the xmlRef and causes the use-after-free.
The same is true for xmlTextReaderRemoveID().
We can intentionally make xmlNodeListGetString() or xmlHashLookup() fail by utilizing either of two different bugs in them. Those bugs can be caused by using entity reference in IDREF declaration.
1. Entity reference that will be normalized later
The following XML causes xmlHashLookup() to fail. When a xmlRef struct is added to the IDREF(S) hash table in xmlAddRef(), its name is normalized to remove extra space characters. (SAX2.c:2090) However, when looking up a xmlRef by the name obtained by xmlNodeListGetString() in xmlTextReaderRemoveRef(), the name is not normalized. Therefore looking up fails by the data inconsistency and xmlHashLookup() returns NULL.
2. Empty entity reference
This bug was found during the analysis, and it isn't used in the proof-of-concept. The following XML causes that xmlNodeListGetString() returns NULL.
Nodes that express attribute values are generated in xmlStringLenGetNodeList(). (SAX2.c:2027) When a attribute value is empty, xmlStringLenGetNodeList() doesn’t consider it as an error, and returns an empty XML_TEXT_NODE. (tree.c:1480)
if (last == NULL) {
ret = node;
} else {
xmlAddNextSibling(last, node);
}
} else if (ret == NULL) {
ret = xmlNewDocText(doc, BAD_CAST "");
}
When processing recursively for entities, xmlStringLenGetNodeList() calls xmlStringGetNodeList(). In contrast to xmlStringLenGetNodeList, xmlStringGetNodeList() considers the empty attribute value as an error.
if (last == NULL) {
ret = node;
} else {
xmlAddNextSibling(last, node);
}
}
This difference causes the bug.
Exploitablity
By allocating a crafted struct(e.g. buffer) in the same place as the free xmlAttr struct, attackers are able to forge almost xmlAttr members. Then, by forging the attr->name pointer, attackers are able to leak information on the heap to stderr by xmlErrValidNode(). (valid.c:6556) Therefore, the application that leave stderr visible will leak information.
Another possible way of exploiting this vulnerability is arbitrary memory overwrite. If an application reuses the xmlDoc struct created in some functions like xmlTextReaderCurrentDoc() in xmlXPathNewContext() and evaluates a XPath query that contains id(), attackers may be able to write data to a crafted address. This can happen because xmlXPathGetElementsByIds() uses xmlGetID() and attackers are able to let the function return a forged id->attr.
However, in order to actually cause arbitrary overwrite, the exploit needs to meet some complicated conditions. (e.g. finding a usable binary-safe buffer) So I have no proof-of-concept for arbitrary memory overwrite.
Proposal for revision
Empty entity reference bug
Just add this.
From bb5f4456e5122561caf02c15dc32d7f153f5e2de Mon Sep 17 00:00:00 2001
From: satoooon8888
Date: Tue, 14 Dec 2021 17:22:06 +0900
Subject: [PATCH 1/3] Fix empty string handling in xmlStringGetNodeList
---
tree.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tree.c b/tree.c
index c707f598..891f11ee 100644
--- a/tree.c
+++ b/tree.c
@@ -1673,6 +1673,8 @@ xmlStringGetNodeList(const xmlDoc *doc, const xmlChar *value) {
} else {
xmlAddNextSibling(last, node);
}
+ } else if (ret == NULL) {
+ ret = xmlNewDocText(doc, BAD_CAST "");
}
out:
--
2.25.1
Normalization bug
I proposal to apply a function like xmlAttrNormalizeSpace() to a return value of xmlStringGetNodeList().
From 2f822378afce78aac07942c2e0a2f59b0b03e436 Mon Sep 17 00:00:00 2001
From: satoooon8888
Date: Tue, 14 Dec 2021 19:32:57 +0900
Subject: [PATCH 2/3] Fix attribute normalization problems in
xmlTextReaderRemoveID and xmlTextReaderRemoveRef
---
xmlreader.c | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)
diff --git a/xmlreader.c b/xmlreader.c
index 72e40b03..68f94c05 100644
--- a/xmlreader.c
+++ b/xmlreader.c
@@ -228,6 +228,40 @@ static int xmlTextReaderNextTree(xmlTextReaderPtr reader);
static void xmlTextReaderFreeNode(xmlTextReaderPtr reader, xmlNodePtr cur);
static void xmlTextReaderFreeNodeList(xmlTextReaderPtr reader, xmlNodePtr cur);
+/**
+ * xmlAttrNormalizeSpace:
+ * @src: the source string
+ * @dst: the target string
+ *
+ * Normalize the space in non CDATA attribute values:
+ * If the attribute type is not CDATA, then the XML processor MUST further
+ * process the normalized attribute value by discarding any leading and
+ * trailing space (#x20) characters, and by replacing sequences of space
+ * (#x20) characters by a single space (#x20) character.
+ * Note that the size of dst need to be at least src, and if one doesn't need
+ * to preserve dst (and it doesn't come from a dictionary or read-only) then
+ * passing src as dst is just fine.
+ *
+ * Returns a pointer to the normalized value (dst) or NULL if no conversion
+ * is needed.
+ */
+static xmlChar *xmlAttrNormalizeSpace(const xmlChar *src, xmlChar *dst) {
+ if ((src == NULL) || (dst == NULL)) return (NULL);
+
+ while (*src == 0x20) src++;
+ while (*src != 0) {
+ if (*src == 0x20) {
+ while (*src == 0x20) src++;
+ if (*src != 0) *dst++ = 0x20;
+ } else {
+ *dst++ = *src++;
+ }
+ }
+ *dst = 0;
+ if (dst == src) return (NULL);
+ return (dst);
+}
+
/**
* xmlFreeID:
* @not: A id
@@ -274,6 +308,7 @@ xmlTextReaderRemoveID(xmlDocPtr doc, xmlAttrPtr attr) {
ID = xmlNodeListGetString(doc, attr->children, 1);
if (ID == NULL)
return(-1);
+ xmlAttrNormalizeSpace(ID, ID);
id = xmlHashLookup(table, ID);
xmlFree(ID);
if (id == NULL || id->attr != attr) {
@@ -330,6 +365,7 @@ xmlTextReaderRemoveRef(xmlDocPtr doc, xmlAttrPtr attr) {
ID = xmlNodeListGetString(doc, attr->children, 1);
if (ID == NULL)
return(-1);
+ xmlAttrNormalizeSpace(ID, ID);
ref_list = xmlHashLookup(table, ID);
xmlFree(ID);
if(ref_list == NULL)
--
2.25.1
In the patch, I copied and pasted xmlAttrNormalizeSpace() from parser.c to xmlreader.c, but I think there are another better ways. I'm not familiar with libxml2 design, so I would like to ask maintainers to fix it later.
Return value handling
Even with the avobe patch, there is still a open normalization issue. (SAX2.c:1375)
if (xmlStrEqual(fullname, BAD_CAST "xml:id")) {
/*
* Add the xml:id value
*
* Open issue: normalization of the value.
*/
if (xmlValidateNCName(value, 1) != 0) {
xmlErrValid(ctxt, XML_DTD_XMLID_VALUE,
"xml:id : attribute value %s is not an NCName\n",
(const char *) value, NULL);
}
xmlAddID(&ctxt->vctxt, ctxt->myDoc, value, ret);
} else ...
We can make xmlTextReaderRemoveRef/ID() fail with this issue and cause the same use-after-free bug.
Normalization problem is troublesome. I think there are still some undiscovered normalization bugs. Therefore, we can not trust that xmlTextReaderRemoveRef/ID() will always remove references to attributes successfully and we should handle its return value.
However, xmlTextReaderRemoveRef/ID() failure means that there is no way to remove a reference to xmlAttr that xmlRef/ID has.
For now, I wrote a patch that makes xmlTextReaderFreeProp() keep allocating xmlAttr to avoid a use-after-free. But, if xmlTextReaderRemoveRef/ID() fails, memory leak will be occured instead of use-after-free. This patch is not perfect, but better.
From 1091b85d3b48e99f7817d0fedb6c47a4f6e6d916 Mon Sep 17 00:00:00 2001
From: satoooon8888
Date: Tue, 21 Dec 2021 19:22:17 +0900
Subject: [PATCH 3/3] Keep allocating attribute in xmlTextReaderFreeProp() when
invalid references remain
---
xmlreader.c | 41 ++++++++++++++++++++++++++++++++++-------
1 file changed, 34 insertions(+), 7 deletions(-)
diff --git a/xmlreader.c b/xmlreader.c
index 68f94c05..7e7ae929 100644
--- a/xmlreader.c
+++ b/xmlreader.c
@@ -384,6 +384,9 @@ xmlTextReaderRemoveRef(xmlDocPtr doc, xmlAttrPtr attr) {
static void
xmlTextReaderFreeProp(xmlTextReaderPtr reader, xmlAttrPtr cur) {
xmlDictPtr dict;
+#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
+ int are_all_references_removed = 1;
+#endif
if ((reader != NULL) && (reader->ctxt != NULL))
dict = reader->ctxt->dict;
@@ -396,12 +399,25 @@ xmlTextReaderFreeProp(xmlTextReaderPtr reader, xmlAttrPtr cur) {
/* Check for ID removal -> leading to invalid references ! */
if ((cur->parent != NULL) && (cur->parent->doc != NULL)) {
- if (xmlIsID(cur->parent->doc, cur->parent, cur))
- xmlTextReaderRemoveID(cur->parent->doc, cur);
- if (((cur->parent->doc->intSubset != NULL) ||
- (cur->parent->doc->extSubset != NULL)) &&
- (xmlIsRef(cur->parent->doc, cur->parent, cur)))
- xmlTextReaderRemoveRef(cur->parent->doc, cur);
+ int res;
+ if (xmlIsID(cur->parent->doc, cur->parent, cur)) {
+ /* Check references are successfully removed */
+ res = xmlTextReaderRemoveID(cur->parent->doc, cur);
+#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
+ if (res != 0)
+ are_all_references_removed = 0;
+#endif
+ }
+ if (((cur->parent->doc->intSubset != NULL) ||
+ (cur->parent->doc->extSubset != NULL)) &&
+ (xmlIsRef(cur->parent->doc, cur->parent, cur))) {
+ /* Check references are successfully removed */
+ res = xmlTextReaderRemoveRef(cur->parent->doc, cur);
+#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
+ if (res != 0)
+ are_all_references_removed = 0;
+#endif
+ }
}
if (cur->children != NULL)
xmlTextReaderFreeNodeList(reader, cur->children);
@@ -413,7 +429,18 @@ xmlTextReaderFreeProp(xmlTextReaderPtr reader, xmlAttrPtr cur) {
reader->ctxt->freeAttrs = cur;
reader->ctxt->freeAttrsNr++;
} else {
- xmlFree(cur);
+#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
+ /*
+ If there are still invalid references to the attribute,
+ must keep allocating the attribute to avoid use-after-free.
+ Instead of use-after-free, memory leak will be occured.
+ */
+ if (are_all_references_removed)
+ xmlFree(cur);
+#else
+ /* In fuzzing mode, must not avoid use-after-free */
+ xmlFree(cur);
+#endif
}
}
--
2.25.1
If we want to handle xmlTextReaderRemoveRef/ID() failure completely, we need to design structures that not occurs both use-after-free and memory leak. To be honest, I'm not still sure what is the best way in that case. This problem is not urgent and the patches come first, but it's worth discussing it.
Acknowledgement
We found this issue with some fuzzers during our experiment for our academic paper.
Since libxml2 has already provided how to build libxml2 with instrumentation and how to run fuzzers, the preparation went smoothly. We deeply appreciate your effort of preparing build scripts and harnesses.
It would be great if you would assign a CVE number to this bug, so that we can refer to it in our paper.
Sincerely,
Shinji Sato