Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
evince
evince
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 678
    • Issues 678
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 32
    • Merge Requests 32
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI / CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • GNOME
  • evinceevince
  • Issues
  • #953

Closed
Open
Opened Aug 01, 2018 by jbarlow83@jbarlow83

Glyphs in PDFs produced by Tesseract OCR render as white boxes

Tesseract OCR uses a glyphless font (a font with a single glyph that occupies empty space) in the PDFs it produces.

When PDFs produced by Tesseract are rendered in Evince and text is selected, Evince draws white boxes over top of the background image that contains the text. The Tesseract team has worked pretty hard on PDF viewer support and compatibility - to my knowledge the Tesseract glyphless font works correctly in Acrobat, Pdfium, PDF.js, macOS Preview, Dropbox PDF Viewer, MuPDF and Ghostscript; with multiple platform and including mobile testing. Other PDF viewers do not attempt to render the glyphless font on top of the background.

Here is the test file (after OCR): linn.pdf

Here is how Evince (macOS Homebrew version) displays such a file. Linux users have reported the same issue to me as well. image

Here is how Acrobat presents the file: image

Related issues:

  • https://github.com/jbarlow83/OCRmyPDF/issues/249
  • https://github.com/jbarlow83/OCRmyPDF/issues/178

The design notes of the glyphless font may be relevant.

Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: GNOME/evince#953