A lot of `Entity name “nbsp” is not known` when indexing epub files
I get a lot of these errors in the journal/logs:
Error extracting EPUB contents (OEBPS/Text/part0205.html): Error on line 15: Entity name “nbsp” is not known
(of course with various line numbers and file names).
This happens very likely because HTML in epub files is parsed as XML. And HTML is not XML. I think that epub contents should be parsed as HTML.
The last error (Entity name “nbsp” is not known
) comes from glib's unescape_gstring_inplace
(which does not recognize
, only five XML entities) called by glib's g_markup_parse_context_parse
which is called by tracker_gsf_parse_xml_in_zip
(src/tracker-extract/tracker-gsf.c
) which gets called from src/tracker-extract/tracker-extract-epub.c
(probably one from extract_opf_contents
but I am not sure here).