g_utf8_normalize: don't read past the end of the buffer
_g_utf8_normalize_wc()
could read past the end of the provided buffer if
it ends with a truncated multibyte character. If max_len
is -1, it can
continue reading until it encounters either a NUL
or unreadable
memory. Avoid this with extra bounds checks prior to g_utf8_get_char()
to ensure that it does not read past either max_len
or a NUL
terminator.
If the result of _g_utf8_normalize_wc()
were directly returned to the caller then this could be an exploitable infoleak in some applications, but the result is transformed from UCS-4 back to UTF-8 by g_ucs4_to_utf8()
, which bails on invalid encodings rather than continuing as _g_utf8_normalize_wc()
does.
So in cases where _g_utf8_normalize_wc()
read off the end of the buffer, g_ucs4_to_utf8()
bails out and returns NULL
. There's a potential to return a few bytes from past the end of a buffer in cases where g_utf8_normalize()
is called on a string with no NUL
terminator and length set by the max_len
argument, and the next bytes past the end of the string are valid UTF-8 continuation bytes. I think that's sufficiently low-probability to treat this as a normal bug report and not a security report.
Discovered by fuzzing the mail indexer mu
, which calls g_utf8_normalize()
on non-validated strings and will crash on inputs with MIME parts that end with truncated multibyte characters.
Bug reproduced and patch tested on OpenBSD/amd64, macOS/arm64, and Linux/x86_64.
Patch passes glib/tests/unicode-normalize
and holds up against a day or so of fuzzing with AFL++.
See also !3342 (merged) for a fuzzing harness.
Example program to reproduce below.
Run it on a 4096-byte test file ending with a truncated multibyte UTF-8 character, for example the output of perl -e 'print (("A" x 4095) . "\x{e2}"'
.
#include <sys/mman.h>
#include <sys/stat.h>
#include <err.h>
#include <fcntl.h>
#include <glib.h>
#include <stdio.h>
#include <stdlib.h>
int
main(int argc, char **argv)
{
struct stat st;
const char *path;
char *in, *res;
size_t len;
int fd;
if (argc != 2) {
fprintf(stderr, "usage: %s <file>\n", getprogname());
return 2;
}
path = argv[1];
if (0 > (fd = open(path, O_RDONLY)))
err(1, "%s: %s", path, "open");
if (0 != fstat(fd, &st))
err(1, "%s: %s", path, "fstat");
len = ((st.st_size + 4095) / 4096) * 4096;
if (MAP_FAILED == (in = mmap(NULL, len, PROT_READ,
MAP_PRIVATE, fd, 0)))
err(1, "%s: %s", path, "mmap");
res = g_utf8_normalize(in, -1, G_NORMALIZE_ALL);
if (!res)
errx(1, "g_utf8_normalize returned NULL");
return 0;
}