Erroneous and non-deterministic results using g_convert() on a UTF-16LE file without BOM
GLib 2.72.2 on Arch Linux
File used: utf16lebom.ini
Downstream issue from which the file is taken: https://gitlab.xfce.org/apps/mousepad/-/issues/172
Code to reproduce the problem:
#include <glib.h>
#include <gio/gio.h>
gint main (gint argc, gchar **argv)
{
gchar *in = NULL, *out;
gsize size, read, written;
GFile *file;
file = g_file_new_for_path ("./utf16lebom.ini");
g_file_load_contents (file, NULL, &in, &size, NULL, NULL);
g_object_unref (file);
g_return_val_if_fail (in != NULL && size > 2, 1);
if (g_strcmp0 (argv[1], "-r") == 0 || g_strcmp0 (argv[1], "--remove-bom") == 0)
{
out = g_strdup (in + 2);
g_free (in);
in = out;
size -= 2;
}
out = g_convert (in, size, "UTF-8", "UTF-16LE", &read, &written, NULL);
g_printerr ("size: %ld\n", size);
g_printerr ("read: %ld\n", read);
g_printerr ("written: %ld\n", written);
g_printerr ("out:\n%s\n", out);
g_free (in);
g_free (out);
return 0;
}
Invoked without arguments, this code consistently displays this output:
$ ./p
size: 212
read: 212
written: 108
out:
?; UNICODE FILE - edit with care ;-)
[Copy-Move]
AskCopy=1
AskMove=1
CopyDir=
MoveDir=
UseNewDlg=0
$
Invoked with the -r
argument (i.e. removing the BOM), it displays various values of read
and written
, and sometimes out
is NULL
:
$ ./p -r
size: 210
read: 210
written: 166
out:
;??
$ ./p -r
size: 210
read: 210
written: 172
out:
;??
$ ./p -r
size: 210
read: 210
written: 170
out:
;??
$ ./p -r
size: 210
read: 210
written: 177
out:
;??
$ ./p -r
size: 210
read: 32
written: 20
out:
(null)
$