Use PNG to store scans instead of JPEG
Somewhat related to #124, which is about using the DjVu (scan/ocr-optimized) image format... while that may or may not be a good choice, what's undeniable is that JPG is a terrible one.
I imported an image (included at the end of this report) into Paperwork. It's a page of copy from a 100-year-old French art critique. Then I edited it to trim the margins down some, and selected "Open Folder" for the document, to view Paperwork's files.
The result of the import and conversion had been stored as a JPG image that... well, here's a side-by-side comparison of a particular region at 300% zoom. The original PNG is on the left, the Paperwork-converted JPG is on the right.
The conversion to JPG is clearly sabotaging Paperwork's ability to OCR the text. The difference is visible even in the scaled-down view here, and at full size it's unmistakable. The original PNG image was far sharper and had significantly less image noise before Paperwork converted it.
JPEG's lossy compression is well-understood to be good for photographs, but turrible for graphics and line art. That makes it not just a poor choice, but possibly the WORST choice for OCR input. (I'm assuming that BMP and GIF aren't even on the table, because obviously those would be worser still.)
Paperwork needs to store pages in a format that maximizes, rather than reduces, the readability of the text to be OCR'd. If not DjVu, then PNG seems a very obvious next choice. Or, if size isn't an issue and maximum quality is the overriding goal, there's a reason TIFF remains the de facto standard for high-end scanning workflows. (But PNG is probably more than sufficient for Paperwork's needs.)
The original source image: