Title in Word .doc file incorrectly interpreted as UTF-8
The attached doc file has a Title of "LONDOCS\247491" but as can be seen from the below, this is interpreted as UTF-8 and results in an invalid character and many entries in syslog about being unable to insert the metadata.
$ TRACKER_VERBOSITY=3 tracker extract /home/ag/Documents/TrackerBugExample.doc ** Message: 23:37:16.944: Starting tracker-extract 2.1.5 ** Message: 23:37:16.944: General options: ** Message: 23:37:16.944: Verbosity ............................ 0 ** Message: 23:37:16.944: Sched Idle ........................... 1 ** Message: 23:37:16.944: Max bytes (per file) ................. 1048576 Setting scheduler policy to SCHED_IDLE Setting priority nice level to 19 Loading extractor rules... (/usr/share/tracker-miners/extract-rules) (tracker-extract:15643): dconf-DEBUG: 23:37:16.944: watch_established: "/org/freedesktop/tracker/extract/" (establishing: 1) Loaded rule '10-abw.rule' Loaded rule '10-bmp.rule' Loaded rule '10-comics.rule' Loaded rule '10-dvi.rule' Loaded rule '10-ebooks.rule' Loaded rule '10-epub.rule' Loaded rule '10-flac.rule' Loaded rule '10-gif.rule' Loaded rule '10-html.rule' Loaded rule '10-ico.rule' Loaded rule '10-jpeg.rule' Loaded rule '10-mp3.rule' Loaded rule '10-msoffice.rule' Loaded rule '10-oasis.rule' Loaded rule '10-pdf.rule' Loaded rule '10-png.rule' Loaded rule '10-ps.rule' Loaded rule '10-raw.rule' Loaded rule '10-svg.rule' Loaded rule '10-tiff.rule' Loaded rule '10-vorbis.rule' Loaded rule '10-xmp.rule' Loaded rule '10-xps.rule' Loaded rule '11-iso.rule' Loaded rule '11-msoffice-xml.rule' Loaded rule '15-gstreamer-guess.rule' Loaded rule '15-playlist.rule' Loaded rule '15-source-code.rule' Loaded rule '90-gstreamer-audio-generic.rule' Loaded rule '90-gstreamer-image-generic.rule' Loaded rule '90-gstreamer-video-generic.rule' Loaded rule '90-text-generic.rule' Extractor rules loaded MIME type guessed as 'application/msword' (from GIO) Using /usr/lib/x86_64-linux-gnu/tracker-miners-2.0/extract-modules/libextract-msoffice.so... @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix nie: <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#> . @prefix nco: <http://www.semanticdesktop.org/ontologies/2007/03/22/nco#> . @prefix nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#> . <file:///home/ag/Documents/TrackerBugExample.doc> nie:contentCreated "2018-12-30T23:35:19Z" ; nie:title "LONDOCS\\�491" ; nie:subject "This is not a subject" ; nco:creator _:3 ; a nfo:PaginatedTextDocument ; nie:plainTextContent "Test doc for tracker bug\r " .