Title in Word .doc file incorrectly interpreted as UTF-8
The attached doc file has a Title of "LONDOCS\247491" but as can be seen from the below, this is interpreted as UTF-8 and results in an invalid character and many entries in syslog about being unable to insert the metadata.
$ TRACKER_VERBOSITY=3 tracker extract /home/ag/Documents/TrackerBugExample.doc
** Message: 23:37:16.944: Starting tracker-extract 2.1.5
** Message: 23:37:16.944: General options:
** Message: 23:37:16.944: Verbosity ............................ 0
** Message: 23:37:16.944: Sched Idle ........................... 1
** Message: 23:37:16.944: Max bytes (per file) ................. 1048576
Setting scheduler policy to SCHED_IDLE
Setting priority nice level to 19
Loading extractor rules... (/usr/share/tracker-miners/extract-rules)
(tracker-extract:15643): dconf-DEBUG: 23:37:16.944: watch_established: "/org/freedesktop/tracker/extract/" (establishing: 1)
Loaded rule '10-abw.rule'
Loaded rule '10-bmp.rule'
Loaded rule '10-comics.rule'
Loaded rule '10-dvi.rule'
Loaded rule '10-ebooks.rule'
Loaded rule '10-epub.rule'
Loaded rule '10-flac.rule'
Loaded rule '10-gif.rule'
Loaded rule '10-html.rule'
Loaded rule '10-ico.rule'
Loaded rule '10-jpeg.rule'
Loaded rule '10-mp3.rule'
Loaded rule '10-msoffice.rule'
Loaded rule '10-oasis.rule'
Loaded rule '10-pdf.rule'
Loaded rule '10-png.rule'
Loaded rule '10-ps.rule'
Loaded rule '10-raw.rule'
Loaded rule '10-svg.rule'
Loaded rule '10-tiff.rule'
Loaded rule '10-vorbis.rule'
Loaded rule '10-xmp.rule'
Loaded rule '10-xps.rule'
Loaded rule '11-iso.rule'
Loaded rule '11-msoffice-xml.rule'
Loaded rule '15-gstreamer-guess.rule'
Loaded rule '15-playlist.rule'
Loaded rule '15-source-code.rule'
Loaded rule '90-gstreamer-audio-generic.rule'
Loaded rule '90-gstreamer-image-generic.rule'
Loaded rule '90-gstreamer-video-generic.rule'
Loaded rule '90-text-generic.rule'
Extractor rules loaded
MIME type guessed as 'application/msword' (from GIO)
Using /usr/lib/x86_64-linux-gnu/tracker-miners-2.0/extract-modules/libextract-msoffice.so...
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix nie: <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#> .
@prefix nco: <http://www.semanticdesktop.org/ontologies/2007/03/22/nco#> .
@prefix nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#> .
<file:///home/ag/Documents/TrackerBugExample.doc> nie:contentCreated "2018-12-30T23:35:19Z" ;
nie:title "LONDOCS\\�491" ;
nie:subject "This is not a subject" ;
nco:creator _:3 ;
a nfo:PaginatedTextDocument ;
nie:plainTextContent "Test doc for tracker bug\r " .
Edited by Andre Klapper