markup parser makes invalid assumptions about input data, crashing networkmanager 1.44.2 build
When building networkmanager 1.44.2 in Chimera Linux (https://github.com/chimera-linux/cports/pull/690), we ran into a segfault in vapigen:
[939/979] Generating vapi/libnm.vapi with a custom command
FAILED: vapi/libnm.vapi
/usr/bin/vapigen --quiet --library=libnm --directory=/builddir/networkmanager-1.44.2/build/vapi --pkg=gio-2.0 --metadatadir=/builddir/networkmanager-1.44.2/vapi /builddir/networkmanager-1.44.2/build/src/libnm-client-impl/NM-1.0.gir
Retrieving a backtrace using the core dump revealed the following:
(lldb) bt
* thread #1, name = 'vapigen', stop reason = signal SIGSEGV
* frame #0: 0x00007ff2616ae140 ld-musl-x86_64.so.1`strlen(s=<unavailable>) at strlen.c:17:29
frame #1: 0x00007ff261171cc9 libvala-0.56.so.0`vala_markup_reader_text(self=0x00007ff260601c80, end_char='<', rm_trailing_whitespace=1) at valamarkupreader.c:996:9
frame #2: 0x00007ff261171698 libvala-0.56.so.0`vala_markup_reader_read_token(self=0x00007ff260601c80, token_begin=0x00007fffde7831e0, token_end=0x00007fffde7831d0) at valamarkupreader.c:899:16
frame #3: 0x00007ff26110fbab libvala-0.56.so.0`vala_gir_parser_parse_symbol_doc [inlined] vala_gir_parser_next(self=0x00007ff25f4bebe0) at valagirparser.c:1486:11
frame #4: 0x00007ff26110fb8e libvala-0.56.so.0`vala_gir_parser_parse_symbol_doc(self=0x00007ff25f4bebe0) at valagirparser.c:5218:4
frame #5: 0x00007ff261113b44 libvala-0.56.so.0`vala_gir_parser_parse_function(self=0x00007ff25f4bebe0, element_name="") at valagirparser.c:8288:12
frame #6: 0x00007ff261107f1c libvala-0.56.so.0`vala_gir_parser_parse_namespace at valagirparser.c:0
frame #7: 0x00007ff261107a49 libvala-0.56.so.0`vala_gir_parser_parse_namespace(self=0x00007ff25f4bebe0) at valagirparser.c:4907:11
frame #8: 0x00007ff2611031a8 libvala-0.56.so.0`vala_gir_parser_parse_file at valagirparser.c:3682:4
frame #9: 0x00007ff26110306b libvala-0.56.so.0`vala_gir_parser_parse_file(self=0x00007ff25f4bebe0, source_file=<unavailable>) at valagirparser.c:1466:2
frame #10: 0x00007ff2610ab8ad libvala-0.56.so.0`vala_code_context_accept [inlined] vala_code_visitor_visit_source_file(self=0x00007ff25f4bebe0, source_file=0x00007ff260402f10) at valacodevisitor.c:226:3
frame #11: 0x00007ff2610ab899 libvala-0.56.so.0`vala_code_context_accept [inlined] vala_source_file_accept(self=0x00007ff260402f10, visitor=0x00007ff25f4bebe0) at valasourcefile.c:837:2
frame #12: 0x00007ff2610ab899 libvala-0.56.so.0`vala_code_context_accept(self=0x00007ff260e06a90, visitor=0x00007ff25f4bebe0) at valacodecontext.c:2012:3
frame #13: 0x00007ff2610f98fe libvala-0.56.so.0`vala_gir_parser_parse(self=0x00007ff25f4bebe0, context=0x00007ff260e06a90) at valagirparser.c:1004:2
frame #14: 0x000055ef5d797ec3 vapigen`___lldb_unnamed_symbol701 + 2083
frame #15: 0x00007ff26165824a ld-musl-x86_64.so.1`libc_start_main_stage2(main=(vapigen`___lldb_unnamed_symbol701), argc=<unavailable>, argv=0x00007fffde783708) at __libc_start_main.c:95:7
I went to take a look at what happens in that code, and found it corresponding to this:
if (g_str_has_prefix ((const gchar*) _tmp12_, "amp;")) {
The g_str_has_prefix
API is a macro:
#define g_str_has_prefix(STR, PREFIX) \
(__builtin_constant_p (PREFIX)? \
G_GNUC_EXTENSION ({ \
const char * const __str = (STR); \
const char * const __prefix = (PREFIX); \
gboolean __result = FALSE; \
\
if G_UNLIKELY (__str == NULL || __prefix == NULL) \
__result = (g_str_has_prefix) (__str, __prefix); \
else \
{ \
const size_t __str_len = strlen (_G_STR_NONNULL (__str)); \
const size_t __prefix_len = strlen (_G_STR_NONNULL (__prefix)); \
if (__str_len >= __prefix_len) \
__result = memcmp (_G_STR_NONNULL (__str), \
_G_STR_NONNULL (__prefix), \
__prefix_len) == 0; \
} \
__result; \
}) \
: \
(g_str_has_prefix) (STR, PREFIX) \
)
From this, one can guess that the failing strlen is for the first argument.
This also implies that g_str_has_prefix
requires a null-terminated input string. This corresponds to the documentation:
The value is a NUL terminated UTF-8 string.
Now, next_pos
is next_pos = _tmp11_ + g_unichar_to_utf8 (u, NULL);
, and _tmp11_
is _tmp11_ = self->priv->current;
. The self->priv->current
is a data pointer somewhere between self->priv->begin
and self->priv->end
, which are retrieved from a memory-mapped file object. The g_mapped_file_get_contents
documentation states:
Note that the contents may not be zero-terminated, even if the GMappedFile is backed by a text file.
That means next_pos
may or may not point to a null terminated pointer. While it might seem safe to do the prefix check as we are sure we have more string data, due to the strlen
that is called early on in g_str_has_prefix
, it is not, and the strlen
may or may not crash (and crashes in this case).
I made a up a quick local workaround in vala to test this:
commit 37b01fd33131eb072bb5d0318f96397654768dfd
Author: q66 <q66@chimera-linux.org>
Date: Thu Oct 5 14:47:49 2023 +0200
work around markup reader possibly blowing up in g_str_has_prefix
diff --git a/vala/valamarkupreader.c b/vala/valamarkupreader.c
index da9b470..7b2c760 100644
--- a/vala/valamarkupreader.c
+++ b/vala/valamarkupreader.c
@@ -987,11 +987,26 @@ vala_markup_reader_text (ValaMarkupReader* self,
vala_report_error (NULL, "invalid UTF-8 character");
} else {
if (u == ((gunichar) '&')) {
+ gchar hackbuf[16] = {0};
gchar* next_pos = NULL;
gchar* _tmp11_;
gchar* _tmp12_;
_tmp11_ = self->priv->current;
next_pos = _tmp11_ + g_unichar_to_utf8 (u, NULL);
+ /* we cannot use next_pos directly with g_str_has_prefix as
+ * we are dealing with input buffer that is possibly not null
+ * terminated, and g_str_has_prefix does a strlen on the whole
+ * input which may blow up once it goes over the boundary; that
+ * renders any guarantee that we have enough string to "safely"
+ * check the prefix irrelevant - until this is fixed in the actual
+ * vala code, make up a guaranteed-null-terminated temporary buffer
+ * and use that to perform the prefix checks, which should work for now
+ */
+ if ((self->priv->end - next_pos) >= sizeof(hackbuf))
+ memcpy(hackbuf, next_pos, sizeof(hackbuf) - 1);
+ else
+ memcpy(hackbuf, next_pos, self->priv->end - next_pos);
+ next_pos = hackbuf;
_tmp12_ = next_pos;
if (g_str_has_prefix ((const gchar*) _tmp12_, "amp;")) {
GString* _tmp13_;
Which indeed allowed networkmanager to compile successfully. A proper fix, however, should be made in the corresponding valamarkupreader.vala
file from which valamarkupreader.c
is generated. I will leave that part to a maintainer, as I'm not familiar enough with the vala language to write an idiomatic fix. I would probably not do a prefix check but rather search for a ;
character after next_pos
(bounded by end
) and then perform a string equality check against amp
, quot
and so on.