ChangeLog 7.07 KB
Newer Older
Jerome Flesch's avatar
0.8.3    
Jerome Flesch committed
1
2
3
26/06/2022 - 0.8.3:
- Workaround https://github.com/pypa/setuptools_scm/issues/727

Jerome Flesch's avatar
0.8.2    
Jerome Flesch committed
4
5
6
7
04/15/2022 - 0.8.2:
- Add support for Tesseravt 5 + Linux
- Fix file descriptor leak (thanks to oda)

Jerome Flesch's avatar
Jerome Flesch committed
8
9
10
05/12/2021 - 0.8.1:
- Make the dependency on setuptools_scm optional

Jerome Flesch's avatar
0.8    
Jerome Flesch committed
11
12
13
14
15
16
01/01/2020 - 0.8.0:
- Replaced libtesseract.image_to_pdf() by an object-oriented API that allows
  creating PDF with more than 1 page (thanks to Matthias Kraus).
- Tesseract 4 + sys.frozen=True: Fix TESSDATA_PREFIX: starting with
  Tesseract 4, the path must include tessdata/

Jerome Flesch's avatar
Jerome Flesch committed
17
18
19
22/06/2019 - 0.7.2:
- Fix setup.py on Windows

Jerome Flesch's avatar
0.7.1    
Jerome Flesch committed
20
21
22
23
24
22/06/2019 - 0.7.1:
- tesseract.can_detect_orientation(): only returns True if 'osd' date files are
  installed
- setup.py: Fix installation in MSYS2

Jerome Flesch's avatar
0.7    
Jerome Flesch committed
25
26
27
28
29
30
31
32
33
34
12/05/2019 - 0.7:
- Drop support for Python <= 2.7
- Fix: Make sure the builder objects can be used to parse box files
  even if Tesseract is not installed.
- PyOCR version is now automatically set in the module by setuptools_scm
  instead of PyOCR's Makefile (except on Windows)
- Tesseract: optim: keep the get_version() in memory instead of calling
  Tesseract everytime (get_version() by psm_parameter() which is called each
  time a box file is parsed ...)

Jerome Flesch's avatar
0.6.0    
Jerome Flesch committed
35
18/02/2019 - 0.6:
Jerome Flesch's avatar
Jerome Flesch committed
36
- Complete rewrite of unit tests (thanks to Thomas Perret)
Jerome Flesch's avatar
0.6.0    
Jerome Flesch committed
37
- Libtesseract 4.0: Fix segfault when running orientation detection
Jerome Flesch's avatar
Jerome Flesch committed
38
  (thanks to Marián Skrip)
Jerome Flesch's avatar
0.6.0    
Jerome Flesch committed
39
40
41
42
43
- Libtesseract 4.0: Add a workaround: Tesseract need the locale to be set to
  'C' (thanks to Thomas Perret)
- Libtesseract: Specify DPI of the image to Libtesseract (thanks to Thomas
  Perret)
- Tesseract 4.0: Improve Tesseract version parsing
Jerome Flesch's avatar
Jerome Flesch committed
44

Jerome Flesch's avatar
Jerome Flesch committed
45
46
47
48
09/04/2018 - 0.5.3:
- Really fix tesseract 4.0 support (thanks to David Martin)
- Tests: switch from nose to pytest (thanks to Elliott Sales de Andrade)

Jerome Flesch's avatar
Jerome Flesch committed
49
50
51
52
25/07/2018 - 0.5.2:
- Fix tesseract 4.0 support: Use option '--psm' instead of '-psm'
- tesseract.detection_orientation(): Fix exception generation

Jerome Flesch's avatar
0.5.1    
Jerome Flesch committed
53
54
55
56
57
58
59
60
61
62
01/03/2017 - 0.5.1:
- libtesseract/Windows: Add possible DLL names for libtesseract
- libtesseract: Keep track of library-loading errors in
  pyocr.libtesseract.lib_load_errors (useful for debugging)
- Build method has been changed: Use now "make install" instead of
  "python3 ./setup.py install"
- cosmetic: builders/WordHTMLParser: Message "OCR confidence not found"
  floods the logs when working with old documents --> switch to debug
  instead of info.

Jerome Flesch's avatar
0.5    
Jerome Flesch committed
63
64
14/12/2017 - 0.5:
- Tesseract/Libtesseract + LineBoxBuilder: Add confidence scores to
65
66
67
68
69
70
71
72
  every word boxes and to hOCR files (thanks to Adriano Pagano)
- Tesseract 4 (shell): Add '--oem 0' to specify legacy model when doing
  orientation detection as orientation detection does not work yet with
  Tesseract 4 (thanks to Adriano Pagano)
- Libtesseract: Fix multi-language support
- Tesseract (shell) + Windows: Never let the cmd window appear
- Libtesseract: Implements image_to_pdf() (thanks to Marian Skrip)
- Libtesseract: Hide debug messages (thanks to Ashish Kulkarni)
Jerome Flesch's avatar
0.5    
Jerome Flesch committed
73

Jerome Flesch's avatar
0.4.7    
Jerome Flesch committed
74
75
76
77
78
79
80
81
82
83
84
85
13/05/2017 - 0.4.7:
- Tesseract 4.00.00alpha:
  - Version parsing: Ignore suffix (so '4.00.00alpha' == (4, 0, 0))
  - Libtesseract: Load libtesseract.so.4 instead of libtesseract.so.3 if
    available
- Support for Tesseract 3.05.00:
  - Builders: Split field 'tess_conf' into 'tess_flags' and 'tess_conf'
  - Libtesseract: If available, use TessBaseAPIDetectOrientationScript()
    instead of TessBaseAPIDetectOS
- Libtesseract: Workaround: Prevents possible segfault in image_to_string()
  when the target language is not available

Jerome Flesch's avatar
0.4.6    
Jerome Flesch committed
86
87
88
26/01/2017 - 0.4.6:
- hOCR outputs: Generate valid XHTML files

Jerome Flesch's avatar
0.4.5    
Jerome Flesch committed
89
90
91
92
93
94
95
10/01/2017 - 0.4.5:
- Clean up exceptions raised when OCR fails:
  - Now, all tools raise only exceptions inheriting from
    pyocr.PyocrException
  - There is now one and only one TesseractError (shared between
    pyocr.libtesseract and pyocr.tesseract)

Jerome Flesch's avatar
0.4.4    
Jerome Flesch committed
96
97
98
08/12/2016 - 0.4.4:
- Fix Python 2.7 support (broken import)

Jerome Flesch's avatar
0.4.3    
Jerome Flesch committed
99
100
101
102
06/12/2016 - 0.4.3:
- (temporary) Use tesseract-sh by default instead of libtesseract. Some
  people have reported crashes with Paperwork+libtesseract. It needs more
  stress-testing
Jerome Flesch's avatar
Jerome Flesch committed
103
104
105
- DigitBuilder is now available in 'pyocr.builders' (can be used
  with libtesseract and cuneiform)
- New builder: DigitLineBoxBuilder
Jerome Flesch's avatar
0.4.3    
Jerome Flesch committed
106
107
108
109
- Windows: Fix pyinstaller packaging suport: env variable TESSDATA_PREFIX
  wasn't set correctly
- Windows: Tesseract-sh: Prevent CMD windows from appearing

Jerome Flesch's avatar
0.4.2    
Jerome Flesch committed
110
05/10/2016 - 0.4.2:
Jerome Flesch's avatar
Jerome Flesch committed
111
112
113
114
115
116
117
* Tesseract: orientation detection: Ignore errors printed by libleptonic
  on stderr (thanks to TeisD)
* Tesseract: Fix support of dev builds (thanks to Fjup)
* Libtesseract: Fix support of dev builds (thanks to Jakub Semerák)
* Tesseract: Use '--list-langs' to get the available languages instead of
  looking for the data directory (thanks to Bernhard Liebl)

Jerome Flesch's avatar
0.4.1    
Jerome Flesch committed
118
119
120
121
122
123
124
125
06/04/2016 - 0.4.1:
* Disable 'libtesseract' with Tesseract <= 3.03. It tends to segfault.
  Libtesseract: Disable it with Tesseract <= 3.03. It tends to segfault.
  Note: the segfault may not actually be related to Libtesseract. It may be due to other things in Debian stable (jessie).
  Anyway, Paperwork cannot work on Debian stable because of that --> disabled just to be safe


13/03/2013 - 0.4.0:
126
127
128
129
130
131
132
133
* New module: 'libtesseract'. Use the C API of Tesseract for OCR.
  This module is more efficient and cleaner than the old 'tesseract' module
  (no more fork + exec + sh, less image manipulation, etc).
  Note that with this module the images are just loaded and uncompressed
  by Pillow. With 'tesseract', they were loaded, uncompressed, re-compressed
  and saved by Pillow, then be reloaded by Leptonica. So the results may
  vary slightly.
* Tesseract: Add support for Win32
Jerome Flesch's avatar
0.4.0    
Jerome Flesch committed
134
* Tesseract: Fix orientation detection for version >= 3.04.01
135
136
137


0.3.1:
Jerome Flesch's avatar
v0.3.1:    
Jerome Flesch committed
138
139
140
141
142
143
144
145
146
* tesseract.detect_orientation(): Use a temporary file instead of stdin
  to transmit the image to Tesseract. Tesseract 3.04 doesn't support
  stdin + "-psm 0" (regression ?)
* tesseract.detect_orientation(): Improve output parsing reliability
* optim: Avoid unnecessary convert to RGB and allow using image formats
  different from PNG
* TextBuilder + Cuneiform: add extra settings for Cuneiform
  (cuneiform_dotmatrix, cuneiform_fax=False, cuneiform_singlecolumn)

147
148

0.3.0:
Jerome Flesch's avatar
0.3.0    
Jerome Flesch committed
149
150
151
* New API: pyocr.<tool>.can_detect_orientation() and
  pyocr.<tool>.detect_orientation()

152
153

0.2.4:
Jerome Flesch's avatar
v0.2.4    
Jerome Flesch committed
154
155
156
* Tesseract : add digit-only support
* Tesseract : add support for Tesseract subsets of layout analysis (-psm)

157
158

0.2.3:
Jerome Flesch's avatar
v0.2.3    
Jerome Flesch committed
159
160
161
162
163
164
* Strip the alpha channel from images before running the OCR. It's basically
  useless and can prevent the tool from working correctly.
* Make hOCR parsing more resistant (handle extra data around box positions)
* Fix: Take into account that new versions of Tesseract uses the file
  extension .hocr instead of .html

165
166

0.2.2:
Jerome Flesch's avatar
v0.2.2:    
Jerome Flesch committed
167
168
169
* Fix Python 3 support
* Add support for Tesseract on Heroku

170
171

0.2.1:
Jerome Flesch's avatar
v0.2.1:    
Jerome Flesch committed
172
* Make it possible to use 'import pyocr' instead of 'from pyocr import pyocr'.
Jerome Flesch's avatar
v0.2.2:    
Jerome Flesch committed
173
  'from pyocr import pyocr' still works but is obsolete.
Jerome Flesch's avatar
v0.2.1:    
Jerome Flesch committed
174
175
176
* Fix dependency list: depends on Pillow (it's untested with PIL)
* Fix pyocr.VERSION

177
178

0.2.0:
Jerome Flesch's avatar
Jerome Flesch committed
179
180
* Python 3.x support

181
182

0.1.2:
Jerome Flesch's avatar
v0.1.2  
Jerome Flesch committed
183
184
* Tesseract: Fix version parsing
* Tesseract: Fix Tesseract 3.02.01's hOCR format support
Jerome Flesch's avatar
Jerome Flesch committed
185

186
187

0.1.1:
Jerome Flesch's avatar
Jerome Flesch committed
188
189
190
* hOCR: Parse lines as well as words
* tesseract.get_available_languages() : Fix fedora support
* Fix UTF-8 support