Skip to content

Allow tesseract 4.0.0alpha to be used with pyocr

Created by: ddddavidmartin

The current tesseract 4.0 version is still in alpha and returns the version string tesseract 4.00.00alpha. This breaks the existing get_version function as it expects integer values only.

To work around it this pull request simply only takes the starting digits of the version and returns these.

Note: I haven't really tried out how pyocr fares with tesseract 4. But, I am using it with paperless and it seems to be working fine for me so far.

How to test this:

  • build and install the current tesseract 4.0.0alpha
  • start consumption with paperless for example
  • the current pyocr version fails with pyocr.error.TesseractError: (0, 'Unable to parse Tesseract version (not a number): [4.00.00alpha]')

Merge request reports