Commit 0355d5ae authored by Jerome Flesch's avatar Jerome Flesch

README: remove redundancy

Signed-off-by: Jerome Flesch's avatarJerome Flesch <jflesch@openpaper.work>
parent 67c3ca26
# PyOCR # PyOCR
PyOCR is an optical character recognition (OCR) tool wrapper for python. PyOCR is an optical character recognition (OCR) tool wrapper for python.
That is, it helps using OCR tools from a Python program. That is, it helps using various OCR tools from a Python program.
It has been tested only on GNU/Linux systems. It should also work on similar It has been tested only on GNU/Linux systems. It should also work on similar
systems (*BSD, etc). It may or may not work on Windows, MacOSX, etc. systems (*BSD, etc). It may or may not work on Windows, MacOSX, etc.
PyOCR can be used as a wrapper for google's
[Tesseract-OCR](http://code.google.com/p/tesseract-ocr/) or Cuneiform.
It can read all image types supported by
[Pillow](https://github.com/python-imaging/Pillow), including jpeg, png, gif,
bmp, tiff, and others. It also support bounding box data.
## Supported OCR tools ## Supported OCR tools
* Libtesseract (C API) * Libtesseract (Python bindings for the C API)
* Tesseract (fork + exec) * Tesseract (wrapper: fork + exec)
* Cuneiform (fork + exec) * Cuneiform (wrapper: fork + exec)
## Features ## Features
* Support all the image formats supported by [Pillow](https://github.com/python-imaging/Pillow) * Supports all the image formats supported by [Pillow](https://github.com/python-imaging/Pillow),
* As output, can provide a simple string or boxes (position + string for each word and line) including jpeg, png, gif, bmp, tiff and others
* Can focus on digits only (Tesseract only) * Various output types: text only, bounding boxes, etc.
* Orientation detection (Tesseract and libtesseract only)
* Can focus on digits only (Tesseract and libtesseract only)
* Can save and reload boxes in hOCR format * Can save and reload boxes in hOCR format
* PDF generation (libtesseract only)
## Limitations ## Limitations
* hOCR: Only a subset of the specification is supported. For instance, pages and paragraph positions are not stored. * hOCR: Only a subset of the specification is supported. For instance, pages and
paragraph positions are not stored.
## Installation ## Installation
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment