Track which documents have been OCR'd, perhaps by "invisible" tags?
PROBLEM: Sometimes for unknown reasons, when you import a lot of large documents, Tesseract dies halfway through OCR'ing those documents. However, if you go back and manually select "Redo OCR" on each document, it works. This is problematic because if you have imported 50 documents it is time intensive to go back to each one and see if it has been OCR'd. If you have a large database, redoing OCR on the whole lot can take 10's of hours.
POSSIBLE SOLUTION: Have Paperwork keep track of which documents have been OCR'd. Then, instead of the two choices "Redo OCR on all documents" or "Redo OCR on this document", you also have the choice "Redo OCR on all documents that have not been OCR'd". This could be as simple as adding a Tag to each OCR'd document and then giving the option to OCR all those documents without the tag. To keep the UI uncluttered and simple, maybe we could make this tag invisible?