`g_print` and `g_printerr` will cause encoding errors on Windows when locale is set to `.utf8`
Description
According to Windows API docs, when the locale is set to .utf8
by using setlocale
, the operating system's libc will automatically convert the output to the correct character set. However, if GLib's g_print
and g_printerr
functions perform another conversion beforehand, it will lead to encoding errors if there are non-ASCII characters.
Reproduction
For example:
#include <glib.h>
#include <locale.h>
int main () {
setlocale (LC_ALL, ".utf8");
g_print ("Unicode test: 你好,世界!\n");
}
We expected to see Unicode test: 你好,世界!
, but actually it prints Unicode test:
. Non-ASCII characters are not shown.
Possible solutions
The root cause of this issue may be the behavior of g_get_console_charset
on Windows.
I propose modifying g_get_console_charset
on Windows to use setlocale (LC_ALL, NULL)
to retrieve the current locale setting. If the returned string contains .utf8
, it can be assumed that the console supports to print UTF-8 strings, and no conversion in GLib is necessary.
Alternatively, we may also use setlocale(LC_ALL, NULL)
outside of g_get_console_charset
(in gmessage.c
's print_string
) to avoid additional conversion.
See also
There is an issue may relate to this, but they may not be the same: