Some file URIs don't get quoted and hashed correctly, and thus don't show their thumbnail
@jeff
Submitted by Jeff F.T. Assigned to Jeff F.T. @jeff
Description
http://specifications.freedesktop.org/thumbnail-spec/thumbnail-spec-latest.html
http://community.roxen.com/developers/idocs/rfc/rfc2396.html
About 95% of my files with esoteric filenames get quoted and hashed correctly so I can retrieve their thumbnail. However, in some corner cases it doesn't work. For instance, take this "raw" (unquoted) URI that we get from gst discoverer:
file:///home/jeff/Vidéos/AZO FAQs - My real job?.mp4
After it goes through pitivi's quote_uri:
file:///home/jeff/Vid%C3%A9os/AZO%20FAQs%20-%20My%20real%20job?.mp4
Apparently, this URI is not encoded/quoted correctly. Therefore, the thumbnail hash we compute from it (39a51fe1c4e8e416910c2e372e5c000d) is incorrect.
The code that handles the quoting (as best as it can) to match RFC 2396 is utils/misc.py's quote_uri:
parts = list(urlsplit(uri, allow_fragments=False))
Make absolutely sure the string is unquoted before quoting again!
raw = unquote(parts[2])
For computing thumbnail md5 hashes in the source list, we must adhere to
RFC 2396. However, urllib's quote method only uses alphanumeric and "/"
as their safe chars. We need to add both the reserved and unreserved chars
RFC_2396_RESERVED = ";/?:@&=+$,"
RFC_2396_UNRESERVED = "-_.!~*'()"
URIC_SAFE_CHARS = "/" + "%" + RFC_2396_RESERVED + RFC_2396_UNRESERVED
parts[2] = quote(raw, URIC_SAFE_CHARS)
uri = urlunsplit(parts)
return uri
Now, if we use Gst.filename_to_uri inside this method (instead of our own manual handling of the RFC with python's urllib "quote"), we get this properly encoded URI:
file:///home/jeff/Vid%C3%A9os/AZO%20FAQs%20-%20My%20real%20job%3F.mp4
...which makes the hashing work.
Imported from https://bugzilla.gnome.org/show_bug.cgi?id=692331