... | ... | @@ -34,7 +34,7 @@ In every folder you have: |
|
|
|
|
|
Here is an example a work directory organization:
|
|
|
|
|
|
```
|
|
|
```sh
|
|
|
$ find ~/papers
|
|
|
/home/jflesch/papers
|
|
|
/home/jflesch/papers/20130505_1518_00
|
... | ... | @@ -69,11 +69,15 @@ $ find ~/papers |
|
|
|
|
|
With Tesseract, the hOCR file can be obtained with following command:
|
|
|
|
|
|
tesseract paper.<X>.jpg paper.<X> -l <lang> hocr && mv paper.<X>.html paper.<X>.words
|
|
|
```sh
|
|
|
tesseract paper.<X>.jpg paper.<X> -l <lang> hocr && mv paper.<X>.html paper.<X>.words
|
|
|
```
|
|
|
|
|
|
For example:
|
|
|
|
|
|
tesseract paper.1.jpg paper.1 -l fra hocr && mv paper.1.html paper.1.words
|
|
|
```sh
|
|
|
tesseract paper.1.jpg paper.1 -l fra hocr && mv paper.1.html paper.1.words
|
|
|
```
|
|
|
|
|
|
### Label files
|
|
|
|
... | ... | |