ocrfeeder-cli does not produce ODT/HTML/TXT and exits with an error
Submitted by Mallik Kumar
Link to original bug (#795582)
Description
Created attachment 371436 Sample image causing the failure
I am using Tesseract 4.0.0-beta.1 on Linux Mint 18.3 x64. I have downloaded and installed ocrfeeder_0.8.1-4_all.deb from https://packages.debian.org/sid/all/ocrfeeder/download
When I run: ocrfeeder-cli -e Tesseract -f HTML -i outpage-24.png -o page24
I receive the following error: UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 712: ordinal not in range(128)
When I try for a TXT output I receive: TypeError: decoding Unicode is not supported
I also receive the following warning before the above error: /usr/lib/python2.7/dist-packages/ocrfeeder/util/lib.py:26: PyGIWarning: Gtk was imported without specifying a version first. Use gi.require_version('Gtk', '3.0') before import to ensure that the right version gets loaded. from gi.repository import Gtk
I am attaching the PNG. These errors occur for many of my images (ODT/HTML/TXT)
Attachment 371436, "Sample image causing the failure":
Version: 0.8.x