libtesseract segfaults with tesseract 4.0.0
Due to this commit, the libtesseract interface segfaults (at least when running tests).
It seems like tesseract >= 4.0.0 needs that binding programs are configured with LC_ALL=C
.
I came up with a quick-and-dirty fix (see attached patch 0001-Set-locale-for-tesseract-4.0.0.patch), setting locale before initializing tesseract API and unsetting after:
in src/pyocr/libtesseract/tesseract_raw.py:343
def init(lang=None):
assert(g_libtesseract)
locale.setlocale(locale.LC_ALL, "C")
handle = g_libtesseract.TessBaseAPICreate()
locale.setlocale(locale.LC_ALL, "")
I think that since the library expect to be in C
locale unsetting it after is probably a bad idea and should be set in src/pyorc/libtesseract/__init__.py
but in that case there could be unexpected side issues (like lang detection in paperwork?). I don't know if the locale.setlocale
function is limited to the namespace.