Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • pyocr pyocr
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 20
    • Issues 20
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • World
  • OpenPaperworkOpenPaperwork
  • pyocrpyocr
  • Issues
  • #104
Closed
Open
Issue created Nov 07, 2018 by Thomas Perret@mohtDeveloper

libtesseract segfaults with tesseract 4.0.0

Due to this commit, the libtesseract interface segfaults (at least when running tests).

It seems like tesseract >= 4.0.0 needs that binding programs are configured with LC_ALL=C.

I came up with a quick-and-dirty fix (see attached patch 0001-Set-locale-for-tesseract-4.0.0.patch), setting locale before initializing tesseract API and unsetting after:

in src/pyocr/libtesseract/tesseract_raw.py:343

def init(lang=None):
    assert(g_libtesseract)
    locale.setlocale(locale.LC_ALL, "C")
    handle = g_libtesseract.TessBaseAPICreate()
    locale.setlocale(locale.LC_ALL, "")

I think that since the library expect to be in C locale unsetting it after is probably a bad idea and should be set in src/pyorc/libtesseract/__init__.py but in that case there could be unexpected side issues (like lang detection in paperwork?). I don't know if the locale.setlocale function is limited to the namespace.

Assignee
Assign to
Time tracking