Tracker assumes mimetype based on filename, not contents
I have some PNG images that, for some reason, have the extension .jpg
. This makes Tracker unhappy:
Aug 03 14:37:53 the tracker-extract-3[474984]: Not a JPEG file: starts with 0x89 0x50
Aug 03 14:37:53 the .tracker-extrac[474984]: Task for 'file:///home/ash/foo.jpg' finished with error: Could not get any metadata for uri:'file:///home/ash/foo.jpg' and mime:'image/jpeg'
I guess this is probably related to glib#2704 (closed), since setting some debug environment variables shows (tracker-extract-3:476194): Tracker-DEBUG: 14:39:43.415: MIME type guessed as 'image/jpeg' (from GIO)
on tracker3 extract
, and gio info
also gets it wrong:
$ gio info foo.jpg | grep content-type
standard::content-type: image/jpeg
standard::fast-content-type: image/jpeg
$ file -bi foo.jpg
image/png; charset=binary
and https://gitlab.gnome.org/GNOME/tracker-miners/-/blob/5aec567a7030f04ea3d379715430ea18bc019d5e/src/tracker-extract/tracker-extract.c#L366 indicates that g_file_info_get_content_type
is being used.
I'm not sure what a good solution to this would be. Maybe if the extraction fails with the mimetype from g_file_info_get_content_type
, it could be retried with the mimetype from g_content_type_guess
(passing data
only, not filename
)? It seems like you really have to call both functions in order to see what the most probable mimetype is: glib#2704 (comment 1515703)