Copying a file with a trailing space to FAT mangles other files' UTF-8 filenames
Steps to reproduce
- Create a bunch of files:
touch 'A тест' 'B тест ' 'C тест'(A, B, C are English letters for sorting, the second filename ends with space, each filename contains Russian letters). Here and below single quotes are not part of the names, it's to show the boundaries, because some names contain spaces and/or end with a space.
- Select these files and copy them in nautilus.
- Paste them to a FAT filesystem.
- The names of 'A тест' and 'C тест' are not changed.
- The trailing space of 'B тест ' is stripped or substituted, because FAT doesn't allow this.
- 'A тест' is copied properly.
- 'B тест ' is renamed to 'B ________'. The trailing space is stripped, but the Russian letters are lost and substituted by two underscores.
- 'C тест' is renamed to 'C ________'. The Russian letters are lost, although the filename was absolutely valid.
- If you copy only 'C тест', it gets copied properly. The substitution of Russian letters happens only for files that follow the file with a trailing space.
- If you also add a file like 'C тттт' to the list, nautilus will ask to rewrite 'C ________', because two files are mapped to the same name.
This behavior looks like copying of 'B тест ' fails (because of the trailing space), nautilus retries with some sanitized filename (converted to ASCII and trimmed) and sets some flag, so that the rest of filenames are "sanitized" right away, without trying to copy them as is.
Nautilus already has some special handling for FAT to replace forbidden characters, so I suggest that more handling is added to handle trailing spaces properly (without removing valid characters as well), and that a failure with one file shouldn't make nautilus think that the rest of files will also fail. If this direction sounds right, I can work on a patch.
More thoughts, more general than this bugreport
Actually, I'm not a fan of such silent renaming behavior (FAT forbidden characters replacement and conversion to ASCII), because the information is lost, and the user is not notified about it. For example, if someone wants to copy a tree of files to a USB flash drive, and then to another computer, they won't even notice that some filenames lost special characters and/or Unicode characters. Coreutils handle it more gracefully by failing and showing an error message, thus letting the user choose a new name themself. Moreover, FAT is not the only filesystem with such limitations - I believe NTFS shares pretty much of them, and most of samba shares also impose similar limitations on filenames, and nautilus handles only FAT - that's another reason why I think such behavior is vicious and should be changed.