Skip to content

extractor: Drop private function autoar_common_get_utf8_pathname

This commit drops the private function autoar_common_get_utf8_pathname. I explain the background below.

CP437 always wins the valid encoding race because it is byte-oriented character encoding and values 00h to ffh are all valid. ISO-8859-1 and Windows-1252 were widely used in Western Europe, while other regions used other character encodings, such as BIG-5, Shift-JIS, KS X 1001, etc. If those east countries' character encodings are used, the file names decoded in CP437 (or ISO-8859-1) make no sence.

This character encoding issue bring us trouble especially when we extract files from a zip file that is created on Windows. On Windows, their zip utility encodes file names in the system locale set by the user. This results in incorrect file name decoding even if the zip file is created under the same version of Windows, E.g. suppose one with Shift-JIS system locale and the other with CP437 system locale, when the Shift-JIS Windows pack a file named あ that is encoded as (82h, A0h), the other decodes it as éá.

We should expect the file names are always encoded in the default encoding of the system (or UTF-8), otherwise applications that uses gnome-autoar tell the library to use a specific encoding.

Merge request reports