Glyphs in PDFs produced by Tesseract OCR render as white boxes
Tesseract OCR uses a glyphless font (a font with a single glyph that occupies empty space) in the PDFs it produces.
When PDFs produced by Tesseract are rendered in Evince and text is selected, Evince draws white boxes over top of the background image that contains the text. The Tesseract team has worked pretty hard on PDF viewer support and compatibility - to my knowledge the Tesseract glyphless font works correctly in Acrobat, Pdfium, PDF.js, macOS Preview, Dropbox PDF Viewer, MuPDF and Ghostscript; with multiple platform and including mobile testing. Other PDF viewers do not attempt to render the glyphless font on top of the background.
Here is the test file (after OCR): linn.pdf
Here is how Evince (macOS Homebrew version) displays such a file. Linux users have reported the same issue to me as well.
Here is how Acrobat presents the file:
Related issues: