tesseract.detect_orientation() dies with empty pages
Created by: darkermatter
Hi!
I'm encountering this error with some of my PDFs:
consumer_1 | **** Warning: considering '0000000000 XXXXX n' as a free entry.
consumer_1 | **** Warning: considering '0000000000 XXXXX n' as a free entry.
consumer_1 | **** Warning: considering '0000000000 XXXXX n' as a free entry.
consumer_1 | **** Warning: considering '0000000000 XXXXX n' as a free entry.
consumer_1 | **** Warning: considering '0000000000 XXXXX n' as a free entry.
consumer_1 | **** Warning: considering '0000000000 XXXXX n' as a free entry.
consumer_1 | **** Warning: considering '0000000000 XXXXX n' as a free entry.
consumer_1 |
consumer_1 | **** This file had errors that were repaired or ignored.
consumer_1 | **** The file was produced by:
consumer_1 | **** >>>> Mac OS X 10.8.2 Quartz PDFContext <<<<
consumer_1 | **** Please notify the author of the software that produced this
consumer_1 | **** file that it does not conform to Adobe's published PDF
consumer_1 | **** specification.
consumer_1 |
consumer_1 | multiprocessing.pool.RemoteTraceback:
consumer_1 | """
consumer_1 | multiprocessing.pool.RemoteTraceback:
consumer_1 | """
consumer_1 | Traceback (most recent call last):
consumer_1 | File "/usr/local/lib/python3.5/site-packages/pyocr/tesseract.py", line 171, in detect_orientation
consumer_1 | angle = int(output['Orientation in degrees'])
consumer_1 | KeyError: 'Orientation in degrees'
consumer_1 |
consumer_1 | During handling of the above exception, another exception occurred:
consumer_1 |
consumer_1 | Traceback (most recent call last):
consumer_1 | File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker
consumer_1 | result = (True, func(*args, **kwds))
consumer_1 | File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
consumer_1 | return list(map(*args))
consumer_1 | File "/usr/src/paperless/src/documents/consumer.py", line 32, in image_to_string
consumer_1 | orientation = self.OCR.detect_orientation(f, lang=lang)
consumer_1 | File "/usr/local/lib/python3.5/site-packages/pyocr/tesseract.py", line 180, in detect_orientation
consumer_1 | % original_output)
consumer_1 | pyocr.tesseract.TesseractError: (-1, 'No script found in image (Too few characters. Skipping this page)')
consumer_1 | """