libtesseract segfaults with tesseract 4.0.0
Due to this commit, the libtesseract interface segfaults (at least when running tests).
It seems like tesseract >= 4.0.0 needs that binding programs are configured with
I came up with a quick-and-dirty fix (see attached patch 0001-Set-locale-for-tesseract-4.0.0.patch), setting locale before initializing tesseract API and unsetting after:
def init(lang=None): assert(g_libtesseract) locale.setlocale(locale.LC_ALL, "C") handle = g_libtesseract.TessBaseAPICreate() locale.setlocale(locale.LC_ALL, "")
I think that since the library expect to be in
C locale unsetting it after is probably a bad idea and should be set in
src/pyorc/libtesseract/__init__.py but in that case there could be unexpected side issues (like lang detection in paperwork?). I don't know if the
locale.setlocale function is limited to the namespace.