ocrfeeder-cli does not produce ODT/HTML/TXT and exits with an error

Submitted by Mallik Kumar

Description

Created attachment 371436 Sample image causing the failure

I am using Tesseract 4.0.0-beta.1 on Linux Mint 18.3 x64. I have downloaded and installed ocrfeeder_0.8.1-4_all.deb from https://packages.debian.org/sid/all/ocrfeeder/download

When I run: ocrfeeder-cli -e Tesseract -f HTML -i outpage-24.png -o page24

I receive the following error: UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 712: ordinal not in range(128)

When I try for a TXT output I receive: TypeError: decoding Unicode is not supported

I also receive the following warning before the above error: /usr/lib/python2.7/dist-packages/ocrfeeder/util/lib.py:26: PyGIWarning: Gtk was imported without specifying a version first. Use gi.require_version('Gtk', '3.0') before import to ensure that the right version gets loaded. from gi.repository import Gtk

I am attaching the PNG. These errors occur for many of my images (ODT/HTML/TXT)

Attachment 371436, "Sample image causing the failure":

Version: 0.8.x