IMPORTANT NOTE FOR WINDOWS USERS: 'paperwork_x.y.z_win64.zip' contains ONLY Paperwork itself, NOT Tesseract. Tesseract and its data files are required to use Paperwork. The list of tesseract's data files depends on which languages you intend to use. So please do not use this .zip. Use the installer (.exe) instead.
I'm pleased to announce the release of Paperwork 1.1. This new release is mostly focused on optimisations.
Main changes are:
- Paperwork-gui 1.1:
- Windows: Activation mechanism has been disabled for now
- Workarounds for Gtk-3.20.x / GLib 2.50 (Ubuntu):
- Work around weird behavior of GLib.idle_add (multiple calls)
- Work around lack of refresh of document list
- Import: Display how many image files, PDFs, documents and pages have been imported.
- Automatic Color Equalization: Reduce the 'circle side-effect' by increasing the number of samples used.
- paperwork-shell scan: Quit after scanning
- Settings window: "Source" becomes "Default source" (cosmetic)
- Export: Don't lock the UI + Display the progression of the export
- Improve keyword highlighting: Highlight words identical to search keywords (as before) and also words close enough (example: 'flesh' when 'flesch' is being search)
- Optim: Document list: Only display display the first 100 elements of the list, and extend it only when required. Reduces GTK latency and CPU usage (GtkListBox doesn't scale very well above 100 elements).
- Optim: Improve PDF rendering speed: Let the libpoppler take care of the rendering size (see backend:page.get_image())
- Optim: Reduce the number of useless calls to Canvas.redraw()
- Paperwork-backend 1.1:
- paperwork-shell: Add commands 'search', 'dump', 'switch_workdir', 'rescan', 'show', 'import', 'delete_doc', 'guess_labels', 'add_label', 'remove_label', 'rename'
- Add methods doc.has_ocr() and page.has_ocr() indicating if OCR has already been run on a given doc/page or not yet. Used in GUI for the option "Redo OCR on all documents" as it must act only on documents where OCR has already been done in the past (ie not PDF with text included)
- Optim: Provides a method page.get_image() returning an already resized Pillow image (PDF rendering optimisation)
- Export: Report progression
- Optim: PDF thumbnail rendering: Keep a cached version of the first page only. The other pages can be rendered on the fly
- Fix: Label directory name use base64 encoding, and this encoding can result in strings containing '/'. Those characters must be replaced (by '_')
- Fix: util/find_language(): If the system locale is not set properly, pycountry may raise UnicodeDecodeError.
- Import: When importing a single PDF, don't import it if it was already previously imported
- Import: Provides detailed information and statistics regarding what has been imported (return value of Importer.import_doc() has changed)
As usual, informations regarding Paperwork installation and update can be found at https://github.com/jflesch/paperwork#readme . Detailed ChangeLog for paperwork-gui is available here: https://github.com/jflesch/paperwork/blob/stable/ChangeLog Detailed ChangeLog for paperwork-backend is available here: https://github.com/jflesch/paperwork-backend/blob/stable/ChangeLog