g_get_charset always returns 8-bit codepage on Windows, crippling UTF-8 output
Submitted by Eduard Braun
Link to original bug (#782578)
Description
From the documentation of g_get_charset():
On Windows the character set returned by this function is the so-called system default ANSI code-page. That is the character set used by the "narrow" versions of C library and Win32 functions that handle file names. It might be different from the character set used by the C library's current locale.
The problem is not so much this definition but the implications of it: Unfortunately g_get_charset() is used by most (if not all) glib (and also glibmm, gtk) functions generating console output to determine the character set in which the output should be printed. (Notable representatives are for example glib's plain g_print() [1] as well as glibmm::Glib::ustring's output stream operator (<<) [2].). That means that if the console encoding does not match the encoding determined by g_get_charset() output on the console will be mostly wrong!
On Windows this is often the case as modern consoles are not at all bound to Windows' archaic codepages and are likely to use an encoding that does not match the one returned by g_get_charset().
For example MSYS2's console uses UTF-8 by default, cmd.exe on my system uses code page 850 by default, while my system's locale as determined by g_get_charset() is 1252. And it doesn't stop there: The Windows console can be easily set to accept UTF-8 output, while glib will be unable to produce output in the proper encoding!
I'd therefore suggest to either a) rethink the usage of g_get_charset() when converting for console output and potentially create a new g_get_console_charset() that suits its purpose better. b) Add a possibility to disable glib's automatic character conversion when creating console output (or rather: let the developer set the encoding glib should choose). This could for example be implemented by adding a conditional in g_get_charset that checks whether a pre-set encoding is desired.
As a) is probably hard to implement (How would one for example determine if the console application is running in an MSYS shell? Even then: How can the encoding of an MSYS shell be determined?) and might break backwards compatibility in unexpected ways, so something along the lines of b) would probably be the better approach.
[1] https://github.com/GNOME/glib/blob/e8487812b9782b6a01e8de9990593558394f4087/glib/gmessages.c#L3083 [2] https://github.com/GNOME/glibmm/blob/0797bf2954177f58b7ac6ebecce7264310481c55/glib/glibmm/ustring.cc#L1430