Many uses of G_GNUC_MALLOC are incorrect
The G_GNUC_MALLOC macro is used to mark a function as "malloc-like". What exactly this means has never been entirely clear, but the way gcc currently interprets it is at odds with the way GLib and other libraries use it.
Here is how G_GNUC_MALLOC is currently documented in GLib:
Expands to the GNU C malloc function attribute if the compiler is gcc. Declaring a function as malloc enables better optimization of the function. A function can have the malloc attribute if it returns a pointer which is guaranteed to not alias with any other pointer when the function returns (in practice, this means newly allocated memory).
Place the attribute after the declaration, just before the semicolon.
See the GNU C documentation for more details.
The above text dates to November 2004, and at the time it was written, it was accurate. However, it seems that GCC's interpretation of the malloc attribute has changed over time.
Between GCC 3.0.4 and 3.3.6, the malloc attribute was documented as follows (A):
The malloc attribute is used to tell the compiler that a function may be treated as if it were the malloc function. The compiler assumes that calls to malloc result in [pointers] that cannot alias anything. This will often improve optimization.
Between GCC 3.4.6 and 4.6.4, it was documented as follows (B):
The malloc attribute is used to tell the compiler that a function may be treated as if any non-NULL pointer it returns cannot alias any other pointer valid when the function returns. This will often improve optimization. Standard functions with this property include malloc and calloc. realloc-like functions have this property as long as the old pointer is never referred to (including comparing it to the new pointer) after the function returns a non-NULL value.
Between GCC 4.7.4 and 4.9.4, it was documented as follows (C):
The malloc attribute is used to tell the compiler that a function may be treated as if any non-NULL pointer it returns cannot alias any other pointer valid when the function returns and that the memory has undefined content. This [often improves] optimization. Standard functions with this property include malloc and calloc. realloc-like functions do not have this property as the memory pointed to does not have undefined content.
Between GCC 5.5.0 and 8.2.0, it has been documented as follows (D):
This tells the compiler that a function is malloc-like, i.e., that the pointer P returned by the function cannot alias any other pointer valid when the function returns, and moreover no pointers to valid objects occur in any storage addressed by P.
Using this attribute can improve optimization. Functions like malloc and calloc have this property because they return a pointer to uninitialized or zeroed-out storage. However, functions like realloc do not have this property, as they can return a pointer to storage containing pointers.
Note the differences here:
- Definitions A and B are essentially equivalent. Note, however,
that B explicitly states that
realloc
does qualify (subject to certain conditions on the caller.) - Definition C adds the condition that the memory "has undefined
content" (although this is contradicted by the assertion that
calloc
qualifies.) - Definition D changes this to a requirement that "no pointers to valid objects occur in any storage addressed by P".
As a result, for example:
-
g_malloc
meets all of these definitions. -
g_strdup
meets definitions A, B, and D, but not C. -
g_strsplit
meets definitions A and B, but not C or D.
Of the various functions that GLib marks with G_GNUC_MALLOC, most are functions that return a newly-allocated string or other non-pointer array. These, I will assume, are fine - even though they don't meet definition C, they appear to meet the intention behind it.
(For what it's worth, note that glibc defines strdup
, strndup
,
wcsdup
, and tempnam
as having the malloc attribute.)
The following functions, however, clearly violate the current
definition of __attribute__((malloc))
, and are very likely to cause
applications to be miscompiled:
- g_memdup
- g_slice_copy
- g_rc_box_dup
- g_atomic_rc_box_dup
The following functions violate the current definition, but seem less likely to cause problems (because the only valid pointers within the newly-allocated array are themselves newly allocated):
- g_bookmark_file_get_groups
- g_bookmark_file_get_applications
- g_bookmark_file_get_uris
- g_uri_list_extract_uris
- g_key_file_get_groups
- g_key_file_get_keys
- g_key_file_get_string_list
- g_key_file_get_locale_string_list
- g_strsplit
- g_strsplit_set
- g_strdupv
The following functions violate the current definition, but are unlikely to cause problems for applications (although they could conceivably cause GLib itself to be miscompiled):
- g_mapped_file_new
- g_mapped_file_new_from_fd
At a minimum, the attribute needs to be removed from the declarations of g_memdup
and
g_slice_copy
, and the new functions g_rc_box_dup
and
g_atomic_rc_box_dup
.
It should probably also be removed from the other functions listed above (and from various functions in GTK+, and probably other libraries, as well.)
If the G_GNUC_MALLOC macro is kept in its current form, it should be documented much more clearly, and the documentation should be written in a way that discourages novice programmers from using it.
Another possibility would be to deprecate the G_GNUC_MALLOC macro and turn it into a no-op, and perhaps introduce another macro that could be used in cases where it's known to be safe.
Here is a simple example of a program that is miscompiled due to the current GLib declarations:
#include <stdio.h>
#include <glib.h>
int main()
{
char hello[] = "#hello world";
char *p = hello, **q;
int i;
q = g_memdup(&p, sizeof(char *));
for (i = 0; i < 12; i++)
(*q)[i] ^= hello[0];
printf("%s\n", &hello[1]);
return 0;
}
On Debian 9 (amd64, gcc 6.3.0-18+deb9u1, glib 2.50.3-2):
$ gcc -O0 hello.c `pkg-config --cflags --libs glib-2.0`; ./a.out
hello world
$ gcc -O1 hello.c `pkg-config --cflags --libs glib-2.0`; ./a.out
KFOOLTLQOG