Commit cb6b2178 authored by Jerome Flesch's avatar Jerome Flesch

Start preparing the version 0.4.0

Signed-off-by: Jerome Flesch's avatarJerome Flesch <jflesch@gmail.com>
parent 27fe2514
0.3.0 --> 0.3.1: 0.4.0:
* New module: 'libtesseract'. Use the C API of Tesseract for OCR.
This module is more efficient and cleaner than the old 'tesseract' module
(no more fork + exec + sh, less image manipulation, etc).
Note that with this module the images are just loaded and uncompressed
by Pillow. With 'tesseract', they were loaded, uncompressed, re-compressed
and saved by Pillow, then be reloaded by Leptonica. So the results may
vary slightly.
* Tesseract: Add support for Win32
0.3.1:
* tesseract.detect_orientation(): Use a temporary file instead of stdin * tesseract.detect_orientation(): Use a temporary file instead of stdin
to transmit the image to Tesseract. Tesseract 3.04 doesn't support to transmit the image to Tesseract. Tesseract 3.04 doesn't support
stdin + "-psm 0" (regression ?) stdin + "-psm 0" (regression ?)
...@@ -8,39 +19,47 @@ ...@@ -8,39 +19,47 @@
* TextBuilder + Cuneiform: add extra settings for Cuneiform * TextBuilder + Cuneiform: add extra settings for Cuneiform
(cuneiform_dotmatrix, cuneiform_fax=False, cuneiform_singlecolumn) (cuneiform_dotmatrix, cuneiform_fax=False, cuneiform_singlecolumn)
0.2.4 --> 0.3.0:
0.3.0:
* New API: pyocr.<tool>.can_detect_orientation() and * New API: pyocr.<tool>.can_detect_orientation() and
pyocr.<tool>.detect_orientation() pyocr.<tool>.detect_orientation()
0.2.3 --> 0.2.4:
0.2.4:
* Tesseract : add digit-only support * Tesseract : add digit-only support
* Tesseract : add support for Tesseract subsets of layout analysis (-psm) * Tesseract : add support for Tesseract subsets of layout analysis (-psm)
0.2.2 --> 0.2.3:
0.2.3:
* Strip the alpha channel from images before running the OCR. It's basically * Strip the alpha channel from images before running the OCR. It's basically
useless and can prevent the tool from working correctly. useless and can prevent the tool from working correctly.
* Make hOCR parsing more resistant (handle extra data around box positions) * Make hOCR parsing more resistant (handle extra data around box positions)
* Fix: Take into account that new versions of Tesseract uses the file * Fix: Take into account that new versions of Tesseract uses the file
extension .hocr instead of .html extension .hocr instead of .html
0.2.1 --> 0.2.2:
0.2.2:
* Fix Python 3 support * Fix Python 3 support
* Add support for Tesseract on Heroku * Add support for Tesseract on Heroku
0.2.0 --> 0.2.1:
0.2.1:
* Make it possible to use 'import pyocr' instead of 'from pyocr import pyocr'. * Make it possible to use 'import pyocr' instead of 'from pyocr import pyocr'.
'from pyocr import pyocr' still works but is obsolete. 'from pyocr import pyocr' still works but is obsolete.
* Fix dependency list: depends on Pillow (it's untested with PIL) * Fix dependency list: depends on Pillow (it's untested with PIL)
* Fix pyocr.VERSION * Fix pyocr.VERSION
0.1.2 --> 0.2.0:
0.2.0:
* Python 3.x support * Python 3.x support
0.1.1 --> 0.1.2:
0.1.2:
* Tesseract: Fix version parsing * Tesseract: Fix version parsing
* Tesseract: Fix Tesseract 3.02.01's hOCR format support * Tesseract: Fix Tesseract 3.02.01's hOCR format support
0.1 --> 0.1.1:
0.1.1:
* hOCR: Parse lines as well as words * hOCR: Parse lines as well as words
* tesseract.get_available_languages() : Fix fedora support * tesseract.get_available_languages() : Fix fedora support
* Fix UTF-8 support * Fix UTF-8 support
#!/usr/bin/env python #!/usr/bin/env python3
import sys import sys
sys.path = ["src"] + sys.path sys.path = ["src"] + sys.path
......
...@@ -6,12 +6,12 @@ setup( ...@@ -6,12 +6,12 @@ setup(
name="pyocr", name="pyocr",
# Don't forget to update src/pyocr/pyocr.py:VERSION as well # Don't forget to update src/pyocr/pyocr.py:VERSION as well
# and download_url # and download_url
version="0.3.1-git", version="0.4.0-git",
description=("A Python wrapper for OCR engines (Tesseract, Cuneiform," description=("A Python wrapper for OCR engines (Tesseract, Cuneiform,"
" etc)"), " etc)"),
keywords="tesseract cuneiform ocr", keywords="tesseract cuneiform ocr",
url="https://github.com/jflesch/pyocr", url="https://github.com/jflesch/pyocr",
download_url="https://github.com/jflesch/pyocr/archive/v0.3.1.zip", download_url="https://github.com/jflesch/pyocr/archive/v0.4.0.zip",
classifiers=[ classifiers=[
"Development Status :: 5 - Production/Stable", "Development Status :: 5 - Production/Stable",
"Intended Audience :: Developers", "Intended Audience :: Developers",
......
...@@ -62,7 +62,7 @@ TOOLS = [ # in preference order ...@@ -62,7 +62,7 @@ TOOLS = [ # in preference order
cuneiform, cuneiform,
] ]
VERSION = (0, 3, 1) VERSION = (0, 4, 0)
def get_available_tools(): def get_available_tools():
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment