Builders : bad use of class attributes
Created by: bennguvaye
Ok I'm pretty much done for the digit builders,
but I stumbled on what I think is a bug.
The builders have lists as class attributes -- file_extensions
, tesseract_configs
, cuneiform_args
-- and at init these lists are appended to, so that :
a = TextBuilder()
b = TextBuilder()
c = TextBuilder()
print(TextBuilder.tesseract_configs)
prints ['-psm', '3', '-psm', '3', '-psm', '3']
But there's worse. Since DigitBuilder
inherits from TextBuilder
and appends "digits" to tesseract_configs
, any subsequent call to TextBuilder
interprets the input as digits -- this was caught in tests, so they're useful :)
Proposed fixes :
- simple : Make these lists instance attributes and not class attributes
- preferred : Do not use lists at all, but just pass a dict of options
then make it into a list later
(this could also be used with a
**kw
for gathering tool-specific options without polluting the builders) - minimal : Redefine the class attribute in children. This still means only one config by class -- impossible to compare
TextBuilder
results with differentpsm
.
Also ideally those attributes should be documented.