Skip to content

Filter out boxes that start at (0, 0)

jowlo requested to merge jowlo/paperwork:hotfix/box-covered-pages into develop

Tesseract returns way too large boxes that cover the whole page, mostly containing only a single special character. All of these boxes (in my tests) have coordinate (0, 0).

This filters out all boxes with coordinate (0, 0). I don't know if any sensible boxes would ever start right in the corner.

I just started using Paperwork and the covered pages almost drove me away. I saw #792 (closed), but could not find a quick solution to filter those boxes extending the page. While not the cleanest, this solution does seem to work consistently and i am willing to miss some boxes rather having the whole page covered on almost half of my documents.

Merge request reports