GLib issueshttps://gitlab.gnome.org/GNOME/glib/-/issues2019-04-11T16:24:14Zhttps://gitlab.gnome.org/GNOME/glib/-/issues/72glib should not create/handle long UTF-8 forms2019-04-11T16:24:14ZBugzillaglib should not create/handle long UTF-8 forms## Submitted by Roozbeh Pournader
**[Link to original bug (#391261)](https://bugzilla.gnome.org/show_bug.cgi?id=391261)**
## Description
Presently, glib's UTF-8 functions use the ISO/IEC 10646 definition of UTF-8 both when handling ...## Submitted by Roozbeh Pournader
**[Link to original bug (#391261)](https://bugzilla.gnome.org/show_bug.cgi?id=391261)**
## Description
Presently, glib's UTF-8 functions use the ISO/IEC 10646 definition of UTF-8 both when handling and when generating UTF-8 data. This means that it accepts and generates UTF-8 for values larger than the largest allowed Unicode character, U+10FFFF. This means that the applications will get invalid Unicode characters instead of an error, making glib not conforming to The Unicode Standard.
As an example, the following piece of code, accepts the "ill-formed" UTF-8 sequence and gives an invalid unicode codepoint of U+11000 without an error:
#include <glib.h>
#include <stdio.h>
int
main ()
{
gunichar *result;
gchar input[] = "\xF4\x90\x80\x80";
result = g_utf8_to_ucs4 (input, -1, NULL, NULL, NULL);
if (result != NULL)
printf ("result is: U+%x\n", result[0]);
g_free (result);
return 0;
}
The same happens with g_unichar_to_utf8, which takes invalid Unicode code points and generates an ill-formed UTF-8 sequence.
Quoting relevant parts from the Unicode 5.0 book:
Page 73:
"[Conformance clause] C9 When a process generates a code unit sequence which purports to be in a Unicode character encoding form, it shall not emit ill-formed code unit sequences.
[...]
C10 When a process interprets a code unit sequence which purports to be in a Unicode character encoding form, it shall treat ill-formed code unit sequences as an error condition and shall not interpret such sequences as characters.
"
Page 103:
"Any UTF-8 byte sequence that does not match the patterns listed in Table 3-7 is ill-formed." [The patterns in Table 3-7, on page 104, do not match `<F4 90 80 80>`.]
We can of course claim that "we support ISO/IEC 10646's UTF-8" and ignore the problem altogether, but this is considered a security problem. Quoting UTR #6, Unicode Security Considerations:
http://www.unicode.org/reports/tr36/#Non_Visual_Recommendations
"A. Ensure that all implementations of UTF-8 used in a system are conformant to the latest version of Unicode. In particular,
A. Always use the so-called "shortest form" of UTF-8
B. Never go outside of 0..10FFFF16
C. Never use 5 or 6 byte UTF-8."
Going this way, also increases the performance of at least those functions that handle UTF-8 data, as the tests become simpler.
Not doing a patch yet as this may be controversial. Please comment.
Version: 2.12.xhttps://gitlab.gnome.org/GNOME/glib/-/issues/2691Some GRegex compile errors are not clearly exposed2022-09-21T11:48:48ZMarco Trevisanmail@3v1n0.netSome GRegex compile errors are not clearly exposedWe've many error values that would need to be added as per new errors in PCRE2.
Specifically in `translate_compile_error`, these seems somewhat nice to have to me:
- [ ] `PCRE2_ERROR_UNICODE_NOT_SUPPORTED`:
- [ ] `PCRE2_ERROR_INVALID_S...We've many error values that would need to be added as per new errors in PCRE2.
Specifically in `translate_compile_error`, these seems somewhat nice to have to me:
- [ ] `PCRE2_ERROR_UNICODE_NOT_SUPPORTED`:
- [ ] `PCRE2_ERROR_INVALID_SUBPATTERN_NAME`
- [ ] `PCRE2_ERROR_CLASS_INVALID_RANGE`
- [ ] `PCRE2_ERROR_PARENTHESES_STACK_CHECK`
- [ ] `PCRE2_ERROR_CALLOUT_NUMBER_TOO_BIG`
- [ ] `PCRE2_ERROR_MISSING_CALLOUT_CLOSING`
- [ ] `PCRE2_ERROR_ESCAPE_INVALID_IN_VERB`
- [ ] `PCRE2_ERROR_NULL_PATTERN`
- [ ] `PCRE2_ERROR_BAD_OPTIONS`
- [ ] `PCRE2_ERROR_PARENTHESES_NEST_TOO_DEEP`
- [ ] `PCRE2_ERROR_BACKSLASH_O_MISSING_BRACE`
- [ ] `PCRE2_ERROR_INVALID_OCTAL`
- [ ] `PCRE2_ERROR_CALLOUT_STRING_TOO_LONG`
- [ ] `PCRE2_ERROR_MISSING_OCTAL_OR_HEX_DIGITS`
- [ ] `PCRE2_ERROR_PATTERN_STRING_TOO_LONG`
- [ ] `PCRE2_ERROR_BAD_LITERAL_OPTIONS`
While I think we can just mark as `internal errors` these:
- `PCRE2_ERROR_HEAP_FAILED`
- `PCRE2_ERROR_INTERNAL_PARSED_OVERFLOW`
- `PCRE2_ERROR_UNICODE_DISALLOWED_CODE_POINT`
- `PCRE2_ERROR_NO_SURROGATES_IN_UTF16`
- `PCRE2_ERROR_INTERNAL_BAD_CODE_LOOKBEHINDS`
- `PCRE2_ERROR_UNICODE_PROPERTIES_UNAVAILABLE`
- `PCRE2_ERROR_INTERNAL_STUDY_ERROR`
- `PCRE2_ERROR_UTF_IS_DISABLED`
- `PCRE2_ERROR_UCP_IS_DISABLED`
- `PCRE2_ERROR_INTERNAL_BAD_CODE_AUTO_POSSESS`
- `PCRE2_ERROR_BACKSLASH_C_LIBRARY_DISABLED`
- `PCRE2_ERROR_INTERNAL_BAD_CODE`
- `PCRE2_ERROR_INTERNAL_BAD_CODE_IN_SKIP`
- `PCRE2_ERROR_ZERO_RELATIVE_REFERENCE`
- `PCRE2_ERROR_LOOKBEHIND_TOO_COMPLICATED`
- `PCRE2_ERROR_BACKSLASH_U_CODE_POINT_TOO_BIG`
- `PCRE2_ERROR_VERSION_CONDITION_SYNTAX`
- `PCRE2_ERROR_CALLOUT_NO_STRING_DELIMITER`
- `PCRE2_ERROR_CALLOUT_BAD_STRING_DELIMITER`
- `PCRE2_ERROR_BACKSLASH_C_CALLER_DISABLED`
- `PCRE2_ERROR_QUERY_BARJX_NEST_TOO_DEEP`
- `PCRE2_ERROR_LOOKBEHIND_TOO_LONG`
- `PCRE2_ERROR_PATTERN_TOO_COMPLICATED`https://gitlab.gnome.org/GNOME/glib/-/issues/2228Localise display names for well-known user directories2020-10-21T15:58:00ZGhost UserLocalise display names for well-known user directoriesOS: macOS (10.11) and Windows
Os default folder names in home aren't localized. I have a German setup but the folders show up in English.
Update by Jehan: This is about the common folders such as `Music/`, Documents/` which are not loc...OS: macOS (10.11) and Windows
Os default folder names in home aren't localized. I have a German setup but the folders show up in English.
Update by Jehan: This is about the common folders such as `Music/`, Documents/` which are not localized when displayed in GtkFileChooser* widgets, apparently both on Windows and macOs. I am not sure if they are localized on Linux, so I guess if they are not either, we should keep it as-is; yet if they are localized on Linux, it would be worth being consistent and do the same on other OSes.https://gitlab.gnome.org/GNOME/glib/-/issues/1545Non-UTF-8 encoded XDG user dirs are displayed wrong.2020-06-23T10:10:32ZAntónio Fernandesantoniof@gnome.orgNon-UTF-8 encoded XDG user dirs are displayed wrong.## Steps to reproduce
Reproducing with nautilus for convenience, but it also affects the GtkFileChooser.
1) Have a ~/.config/user-dirs.conf text file with the following content:
> filename_encoding=koi8r
2) Quit nautilus with this co...## Steps to reproduce
Reproducing with nautilus for convenience, but it also affects the GtkFileChooser.
1) Have a ~/.config/user-dirs.conf text file with the following content:
> filename_encoding=koi8r
2) Quit nautilus with this command:
> nautilus -q
3) Run this command to update user-dirs.dirs to ru_RU.KOI8-R:
> LANG=ru_RU.koi8r xdg-user-dirs-gtk-update
(In the dialog, choose the second action, which confirms updating the folder names.)
4) Launch nautilus in Russian:
> LANG=ru_RU.koi8r nautilus
(Afterwards, the previous localization can be restored by running `xdg-user-dirs-gtk-update` without the LANG=ru_RU.koi8r part.)
## Current behavior
![Captura_de_ecrã_de_2018-09-08_23-45-30](/uploads/5a9c74c86fc719205357fdc3a3c9b39e/Captura_de_ecrã_de_2018-09-08_23-45-30.png)
The labels in the sidebar show invalid characters. The tooltip shows escaped URL.
The actual file list (both in nautilus and file chooser) properly displays the localized names for these folders (for instance, "Music" is "Музыка").
## Expected outcome
The labels in the sidebar should display the localized folder names (ex.: "Музыка").
I'm not sure what the expected tooltip should be. But the encoded URL doesn't seem very useful.
## Version information
Gtk+ 3.22.30 on Fedora 28.
## Additional information
Originally reported in https://bugzilla.gnome.org/show_bug.cgi?id=710487https://gitlab.gnome.org/GNOME/glib/-/issues/1469An issue with non-utf locale and g_format_size_full2019-05-15T08:22:54ZGhost UserAn issue with non-utf locale and g_format_size_fullGlib based programs create a nice illusion that you can work with everything in UTF-8, except for some corner cases like filenames where you need to take more care. I've found that this illusion fails in some other cases too.
If you run...Glib based programs create a nice illusion that you can work with everything in UTF-8, except for some corner cases like filenames where you need to take more care. I've found that this illusion fails in some other cases too.
If you run the following test program with LC_ALL=cs_CZ.iso-8859-2, the output of the first and third g_print complains about invalid UTF-8 string. Internally g_format_size_full uses g_string_printf for number formatting that outputs strings in the encoding set by the locale, but glib expects UTF-8 strings internally.
I don't think this is easily solvable without forcing the underlying sprintf calls to output UTF-8 encoded strings or replacing them entirely, because I'd expect that I can use utf-8 in g_strdup_printf("číslo %f'.2", 123456), and it will produce a mixed encoding result as of now.
```c
#include <glib.h>
#include <locale.h>
int main(void) {
setlocale(LC_ALL, "");
g_print("%s\n", g_strdup_printf("%'.2f", 123456789.0));
g_print("%s\n", "nech to koňovi, má větší hlavu");
g_print("%s\n", g_format_size_full(123456789, G_FORMAT_SIZE_LONG_FORMAT | G_FORMAT_SIZE_IEC_UNITS));
return 0;
}
```
```
[Invalid UTF-8] 123\xa0456\xa0789,00
nech to koňovi, má větší hlavu
[Invalid UTF-8] 117,7\xc2\xa0MiB (123\xa0456\xa0789\xc2\xa0bajt\xc5\xaf)
```
https://cs.wikipedia.org/wiki/ISO_8859-2
\xa0 is NBSP non-breakable space
I frankly don't know where \xc2 is getting from.https://gitlab.gnome.org/GNOME/glib/-/issues/1359g_win32_locale_filename_from_utf8() failed to convert a local path2019-01-29T16:57:00ZBugzillag_win32_locale_filename_from_utf8() failed to convert a local path## Submitted by Jehan `@Jehan`
**[Link to original bug (#795006)](https://bugzilla.gnome.org/show_bug.cgi?id=795006)**
## Description
In GIMP, [bug 794949](https://bugzilla.gnome.org/show_bug.cgi?id=794949), we had a case on Windows...## Submitted by Jehan `@Jehan`
**[Link to original bug (#795006)](https://bugzilla.gnome.org/show_bug.cgi?id=795006)**
## Description
In GIMP, [bug 794949](https://bugzilla.gnome.org/show_bug.cgi?id=794949), we had a case on Windows with an image with a path such as "F:\都.png". We needed to load metadata with GExiv2, which unfortunately doesn't have support for GFile or GInputStream/GOutputStream yet (cf. [bug 732748](https://bugzilla.gnome.org/show_bug.cgi?id=732748)).
So we passed the result of g_file_get_path() in g_win32_locale_filename_from_utf8(). Unfortunately it failed and returned NULL (for the record, g_file_get_path() properly returned a valid UTF-8 value as far as we could see).
So I read the function docs which says it may fail when the string contains unicode characters not representable in the system codepage. Yet since this is the path of an actual file currently existing in the filesystem, I assume it should be convertible in the system codepage, so that would be a bug. Or am I misunderstanding something?
Version: 2.55.xhttps://gitlab.gnome.org/GNOME/glib/-/issues/1344g_utf8_collate_key_for_filename() corner cases with digits2024-01-17T01:14:40ZBugzillag_utf8_collate_key_for_filename() corner cases with digits## Submitted by Paul `@20YearsOfGnome`
**[Link to original bug (#793747)](https://bugzilla.gnome.org/show_bug.cgi?id=793747)**
## Description
Created attachment 368820
Screenshot of Nautilus sorting the test files by name
Moved her...## Submitted by Paul `@20YearsOfGnome`
**[Link to original bug (#793747)](https://bugzilla.gnome.org/show_bug.cgi?id=793747)**
## Description
Created attachment 368820
Screenshot of Nautilus sorting the test files by name
Moved here from the relevant Nautilus bug: https://gitlab.gnome.org/GNOME/nautilus/issues/264
Create some test files as follows:
`$ touch 000001000010-0.jpg 000001000010-A.jpg 000001A00010-0.jpg 000003BBF000-0.jpg 00003bA1A000-0.jpg 00003BD22000-0.jpg 0000A4AC3000-0.jpg 000100001 000100001.jpg 000200001`
View them at the command line and in Nautilus:
```
$ ls -1
000001000010-0.jpg
000001000010-A.jpg
000001A00010-0.jpg
000003BBF000-0.jpg
00003bA1A000-0.jpg
00003BD22000-0.jpg
0000A4AC3000-0.jpg
000100001
000100001.jpg
000200001
$ nautilus .
[see attached screenshot]
```
ls sorts files as one might expect. It is not case sensitive (unless you use a case sensitive locale, e.g. LANG=C), but sorts alphabetically.
Nautilus sorts the files in a bizarre order, regardless of which locale is used. Weird behaviours include:
* Longer but otherwise equal filenames sort before shorter ones
* Sometimes ignores runs of zeros, but not punctuation
* Seems to detect runs of digits and push them to the end
The actual behaviour is very complex and difficult to predict, though it must follow some internal logic. The end result is that files don't sort in any reasonable order. This impacts several Gnome applications, such as Eye of Gnome and Nautilus. Other applications, like Transmission, respect locale.
**Attachment 368820**, "Screenshot of Nautilus sorting the test files by name":
![there_was_an_attempt](/uploads/0a6444999fc1ca5275a0cd9dbf22390e/there_was_an_attempt.png)
Version: 2.54.x
### Blocking
* [Bug 355152](https://bugzilla.gnome.org/show_bug.cgi?id=355152)https://gitlab.gnome.org/GNOME/glib/-/issues/1333Optionally use libunistring or libicu to provide Unicode data2020-07-09T12:49:02ZBugzillaOptionally use libunistring or libicu to provide Unicode data## Submitted by Philip Withnall `@pwithnall`
**[Link to original bug (#793252)](https://bugzilla.gnome.org/show_bug.cgi?id=793252)**
## Description
GLib currently has around 1MB of Unicode tables loaded in memory for providing chara...## Submitted by Philip Withnall `@pwithnall`
**[Link to original bug (#793252)](https://bugzilla.gnome.org/show_bug.cgi?id=793252)**
## Description
GLib currently has around 1MB of Unicode tables loaded in memory for providing character information. That memory is shared between all processes, so the hit is not large, but for smaller devices it’s still a bit of a problem.
Given that libicu and libunistring are widely available on Linux, and loaded by gnome-shell, WebKit, flatpak helpers, gnome-builder, amongst others (see `sudo grep libunistring /proc/*/maps`), we could consider using one of them (if available) to provide character information.
On really small embedded devices, people seem to want to drop the Unicode data from GLib entirely. I don’t know if a platform-specific replacement is available (does uclibc have the right data?). I suspect dropping it entirely from GLib will break too many functions at runtime. We could look at making sure it’s optimised out by -fdata-sections -ffunction-sections -Wl,--gc-sections if unused.
Version: 2.55.xhttps://gitlab.gnome.org/GNOME/glib/-/issues/1332Remove Perl dependency (gen-unicode-tables.pl, tests/gen-*-txt.pl)2018-07-09T09:12:43ZBugzillaRemove Perl dependency (gen-unicode-tables.pl, tests/gen-*-txt.pl)## Submitted by Philip Withnall `@pwithnall`
**[Link to original bug (#793250)](https://bugzilla.gnome.org/show_bug.cgi?id=793250)**
## Description
With most of the rest of the GLib tooling ported to Python, and Python being require...## Submitted by Philip Withnall `@pwithnall`
**[Link to original bug (#793250)](https://bugzilla.gnome.org/show_bug.cgi?id=793250)**
## Description
With most of the rest of the GLib tooling ported to Python, and Python being required for Meson and gtk-doc, it makes sense to drop our remaining Perl tooling:
• gen-unicode-tables.pl
• tests/gen-casefold-txt.pl
• tests/gen-casemap-txt.pl
and port them to Python.
It might make sense to get the script to use the XML versions of the Unicode data, rather than the TXT versions, to save some parsing pain:
https://www.unicode.org/Public/UCD/latest/ucdxml/
This is not a high priority, since these scripts are only used occasionally, manually, when there is a Unicode release.
Version: 2.55.xhttps://gitlab.gnome.org/GNOME/glib/-/issues/1303convert: test failure on NetBSD2024-01-15T21:33:27ZBugzillaconvert: test failure on NetBSD## Submitted by Thomas Klausner `@_wiz_`
**[Link to original bug (#790698)](https://bugzilla.gnome.org/show_bug.cgi?id=790698)**
## Description
On NetBSD, the convert self test fails:
PASS: convert 2 /conversion/iconv-state
ERROR: ...## Submitted by Thomas Klausner `@_wiz_`
**[Link to original bug (#790698)](https://bugzilla.gnome.org/show_bug.cgi?id=790698)**
## Description
On NetBSD, the convert self test fails:
PASS: convert 2 /conversion/iconv-state
ERROR: convert - too few tests run (expected 7, got 2)
ERROR: convert - exited with status 134 (terminated by signal 6?)
Running glib/tests/convert directly, I see:
/conversion/no-conv: OK
/conversion/iconv-state: OK
/conversion/illegal-sequence: **
ERROR:convert.c:82:test_one_half: assertion failed (error == (g_convert_error, 1)): error is NULL
Abort
When I comment out the assertions in line 82, it fails again later in the same function:
# ./convert
/conversion/no-conv: OK
/conversion/iconv-state: OK
/conversion/illegal-sequence: **
ERROR:convert.c:98:test_one_half: assertion failed (out == "a"): ("?" == "a")
Abort
When I comment out these assertions as well, the test program succeeds.
AIU, the test wants to convert the ISO-8859-1 or -15 sequence \xc2\xbd to UTF-8 and expects it to fail. I'm not sure why that is so, but on NetBSD it fails differently than expected: The output buffer contains a single question mark "?".
NetBSD does not use libiconv but has its own implementation. Perhaps the return value handling is not identical and this leads to that result.
The NetBSD man page for iconv() is at
http://netbsd.gw.com/cgi-bin/man-cgi?iconv+3+NetBSD-current
Version: 2.54.xhttps://gitlab.gnome.org/GNOME/glib/-/issues/1286g_unichar_iszerowidth does not handle Prepended_Concatenation_Mark correctly2019-05-14T10:22:26ZBugzillag_unichar_iszerowidth does not handle Prepended_Concatenation_Mark correctly## Submitted by Mike Frysinger
**[Link to original bug (#787229)](https://bugzilla.gnome.org/show_bug.cgi?id=787229)**
## Description
glib currently marks all Cf (Format Character) as zero width, but this ignores Prepended_Concatena...## Submitted by Mike Frysinger
**[Link to original bug (#787229)](https://bugzilla.gnome.org/show_bug.cgi?id=787229)**
## Description
glib currently marks all Cf (Format Character) as zero width, but this ignores Prepended_Concatenation_Mark codepoints. i guess gen-unicode-tables.pl should be consulting PropList.txt from the Unicode releases.
specifically these should all return false w/g_unichar_iszerowidth:
0600..0605 ; Prepended_Concatenation_Mark # Cf ARABIC NUMBER SIGN..ARABIC NUMBER MARK ABOVE
06DD ; Prepended_Concatenation_Mark # Cf ARABIC END OF AYAH
070F ; Prepended_Concatenation_Mark # Cf SYRIAC ABBREVIATION MARK
08E2 ; Prepended_Concatenation_Mark # Cf ARABIC DISPUTED END OF AYAH
110BD ; Prepended_Concatenation_Mark # Cf KAITHI NUMBER SIGN
Unicode 10.0.0 chapter 9 section 2 page 377-378 [1] states:
Signs Spanning Numbers. Several other special signs are written in association with numbers in the Arabic script. All of these signs can span multiple-digit numbers, rather than just a single digit. They are not formally considered combining marks in the sense used by the Unicode Standard, although they clearly interact graphically with their associated sequence of digits. In the text representation they precede the sequence of digits that they span, rather than follow a base character, as would be the case for a combining mark. Their General_Category value is Cf (format character). Unlike most other format characters, however, they should be rendered with a visible glyph, even in circumstances where no suitable digit or sequence of digits follows them in logical order. The characters have the Bidi_Class value of Arabic_Number to make them appear in the same run as the numbers following them.
A few similar signs spanning numbers or letters are associated with scripts other than Arabic. See the discussion of U+070F syriac abbreviation mark in Section 9.3, Syriac, and the discussion of U+110BD kaithi number sign in Section 15.2, Kaithi. All of these prefixed format controls, including the non-Arabic ones, are given the property value Prepended_Concatenation_Mark=True, to identify them as a class. They also have special behavior in text segmentation. (See Unicode Standard Annex #29, “Unicode Text Segmentation.”)
[1] http://unicode.org/versions/Unicode10.0.0/ch09.pdfhttps://gitlab.gnome.org/GNOME/glib/-/issues/1209Corrupted characters in Greek filenames when saving a pdf report.2018-05-24T19:08:24ZBugzillaCorrupted characters in Greek filenames when saving a pdf report.## Submitted by Nikos Charonitakis
**[Link to original bug (#772411)](https://bugzilla.gnome.org/show_bug.cgi?id=772411)**
## Description
Created attachment 336905
screenshot of saved report showing the corrupted characters
I saved...## Submitted by Nikos Charonitakis
**[Link to original bug (#772411)](https://bugzilla.gnome.org/show_bug.cgi?id=772411)**
## Description
Created attachment 336905
screenshot of saved report showing the corrupted characters
I saved a pdf report with a greek filename and greek filename characters converted in something unreadable. I think that i have see the same problems at least since 2.6.11.
**Attachment 336905**, "screenshot of saved report showing the corrupted characters":
![2016-10-04__2_](/uploads/62aad2d94373b3c170100d43d2fcad43/2016-10-04__2_.png)
Version: 2.42.xhttps://gitlab.gnome.org/GNOME/glib/-/issues/1150Support collation of non-ASCII digits with g_utf8_collate_key_for_filename()2022-01-19T12:20:00ZBugzillaSupport collation of non-ASCII digits with g_utf8_collate_key_for_filename()## Submitted by Mahdi Rajabi
**[Link to original bug (#764225)](https://bugzilla.gnome.org/show_bug.cgi?id=764225)**
## Description
I have many file . It name is Persian numbers. (۱.mp4, ۲.mp4)
Nautilus doesn't sort by Persian numbe...## Submitted by Mahdi Rajabi
**[Link to original bug (#764225)](https://bugzilla.gnome.org/show_bug.cgi?id=764225)**
## Description
I have many file . It name is Persian numbers. (۱.mp4, ۲.mp4)
Nautilus doesn't sort by Persian numbers.
On Ubuntu 15.10
Arrange Item : By Name
Version: 2.48.x
---
As per #2576, Bangla numbers are also not currently supported. And should be.https://gitlab.gnome.org/GNOME/glib/-/issues/937More robust check for UTF-8 charset2018-05-24T17:06:57ZBugzillaMore robust check for UTF-8 charset## Submitted by Mikhail Zabaluev `@mzabaluev`
**[Link to original bug (#738044)](https://bugzilla.gnome.org/show_bug.cgi?id=738044)**
## Description
g_get_charset() uses a case-sensitive substring match for "UTF-8" in order to decid...## Submitted by Mikhail Zabaluev `@mzabaluev`
**[Link to original bug (#738044)](https://bugzilla.gnome.org/show_bug.cgi?id=738044)**
## Description
g_get_charset() uses a case-sensitive substring match for "UTF-8" in order to decide if the character set specified by the environment is UTF-8. The matching should either be made strict, or then allow case-insensitivity and actually use the aliases rather than shambolic heuristics.
This dates back from commit b5fa5b9867eec91047a16d45f79888395cf89931 made in 2001 while working on bug #58195.
Version: 2.42.xhttps://gitlab.gnome.org/GNOME/glib/-/issues/916g_locale_from_utf8 on Windows fails to handle code points above U+01002021-09-23T17:12:51ZBugzillag_locale_from_utf8 on Windows fails to handle code points above U+0100## Submitted by Devin Acker
**[Link to original bug (#734886)](https://bugzilla.gnome.org/show_bug.cgi?id=734886)**
## Description
Test program: https://gist.github.com/devinacker/cd09eb2ab4608b3d90f1
(compiled as UTF-8 using MinGW)...## Submitted by Devin Acker
**[Link to original bug (#734886)](https://bugzilla.gnome.org/show_bug.cgi?id=734886)**
## Description
Test program: https://gist.github.com/devinacker/cd09eb2ab4608b3d90f1
(compiled as UTF-8 using MinGW)
In the above test program (as well as Windows builds of HexChat, which is how I originally discovered the issue), attempting to call g_locale_from_utf8 with a string containing only Unicode code points below U+0100 returns an appropriately translated string, while higher code points (such as the Japanese text in the test program) result in the error "Invalid byte sequence in conversion input", even when the actual UTF-8 byte sequence appears to be completely valid.
I have tested this repeatedly with various combinations of non-ASCII characters both above and below U+0100 and gotten the same results, on both Windows 7 and 8, with the system locale set to English (United States).
Version: 2.40.xhttps://gitlab.gnome.org/GNOME/glib/-/issues/907RTL strings in g_option are broken2019-05-14T13:47:21ZBugzillaRTL strings in g_option are broken## Submitted by David Gómez
**[Link to original bug (#733874)](https://bugzilla.gnome.org/show_bug.cgi?id=733874)**
## Description
RTL strings are unusable.
Currently, what is showed is this
-d, --working-directory הדובע רודמ...## Submitted by David Gómez
**[Link to original bug (#733874)](https://bugzilla.gnome.org/show_bug.cgi?id=733874)**
## Description
RTL strings are unusable.
Currently, what is showed is this
-d, --working-directory הדובע רודמ תעיבק (wrong)
instead of
-d, --working-directory קביעת מדור עבודה (correct)https://gitlab.gnome.org/GNOME/glib/-/issues/792Strings returned from g_get_language_names() can be empty2021-05-26T15:08:07ZBugzillaStrings returned from g_get_language_names() can be empty## Submitted by Philip Chimento `@ptomato`
**[Link to original bug (#712395)](https://bugzilla.gnome.org/show_bug.cgi?id=712395)**
## Description
Apparently "en_US:" (with a colon) is a value for the LANGUAGE environment variable th...## Submitted by Philip Chimento `@ptomato`
**[Link to original bug (#712395)](https://bugzilla.gnome.org/show_bug.cgi?id=712395)**
## Description
Apparently "en_US:" (with a colon) is a value for the LANGUAGE environment variable that can occur in the wild:
http://serverfault.com/questions/455922/in-ubuntu-what-is-the-difference-between-en-usutf8-and-en-us-when-setting-lan
g_get_language_names() doesn't handle this well, the returned array is:
[ "en_US", "en", "", "C" ]
There's nothing in the documentation of g_get_language_names() that says it can't return an empty string, but it's certainly unexpected.https://gitlab.gnome.org/GNOME/glib/-/issues/526no alphabetic order of the files on Mac OS X2020-01-29T13:51:33ZBugzillano alphabetic order of the files on Mac OS X## Submitted by goe..@..web.de
**[Link to original bug (#672336)](https://bugzilla.gnome.org/show_bug.cgi?id=672336)**
## Description
have a look at the attached screenshots. i click twice on "name" and in no way the files are in al...## Submitted by goe..@..web.de
**[Link to original bug (#672336)](https://bugzilla.gnome.org/show_bug.cgi?id=672336)**
## Description
have a look at the attached screenshots. i click twice on "name" and in no way the files are in alphabetic order.https://gitlab.gnome.org/GNOME/glib/-/issues/445localised number support for g_format_size()2019-05-15T11:33:03ZBugzillalocalised number support for g_format_size()## Submitted by Allison (desrt)
**[Link to original bug (#658153)](https://bugzilla.gnome.org/show_bug.cgi?id=658153)**
## Description
In the solution to [bug 658107](https://bugzilla.gnome.org/show_bug.cgi?id=658107) I added a func...## Submitted by Allison (desrt)
**[Link to original bug (#658153)](https://bugzilla.gnome.org/show_bug.cgi?id=658153)**
## Description
In the solution to [bug 658107](https://bugzilla.gnome.org/show_bug.cgi?id=658107) I added a function capable of formatting integers using the locale-specific digits (like Arabic numerals for Persian, etc.).
It would be pretty sweet if we could use the same function for g_format_size() so that the output of this function (which is intended for showing to the user) could be printed using the digits of the user's locale.https://gitlab.gnome.org/GNOME/glib/-/issues/426NormalizationTest.txt2018-05-24T13:14:26ZBugzillaNormalizationTest.txt## Submitted by Behdad Esfahbod
**[Link to original bug (#655017)](https://bugzilla.gnome.org/show_bug.cgi?id=655017)**
## Description
There's an extensive normalization test data file at:
http://www.unicode.org/Public/6.0.0/ucd/...## Submitted by Behdad Esfahbod
**[Link to original bug (#655017)](https://bugzilla.gnome.org/show_bug.cgi?id=655017)**
## Description
There's an extensive normalization test data file at:
http://www.unicode.org/Public/6.0.0/ucd/NormalizationTest.txt
We should write a test to consume that. The file is huge, so we cannot include it in entirety, but we can copy the more interesting parts of it in-tree.