... | ... | @@ -11,12 +11,14 @@ a number added in case of name collision. |
|
|
In every folder you have:
|
|
|
|
|
|
* For image documents:
|
|
|
* paper.<X>.jpg : A page in JPG format (X starts at 1)
|
|
|
* paper.<X>.jpg : A page in JPG format (X starts at 1). It's the original page (as scanned or imported).
|
|
|
* paper.<X>.edited.jpg : The page after post-processing and editing.
|
|
|
* paper.<X>.words (optional) : A
|
|
|
[hOCR](https://docs.google.com/document/d/1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0/preview)
|
|
|
file, containing all the words found on the page using the OCR (optional, but required for indexing ; can be regenerated with the options "Redo OCR (...)").
|
|
|
* paper.<X>.thumb.jpg (optional, generated automatically) : A thumbnail version of the page (faster to load)
|
|
|
* labels (optional) : a text file containing the labels applied on this document
|
|
|
* paper.<X>.thumb.jpg (optional, generated automatically) : A thumbnail version of the page (faster to load).
|
|
|
Starting with Paperwork 2.0, only paper.1.thumb.jpg is used.
|
|
|
* labels (optional) : a text file containing the labels applied on this document (text + label color)
|
|
|
* extra.txt (optional) : extra keywords added by the user
|
|
|
* For PDF documents:
|
|
|
* doc.pdf : the document
|
... | ... | @@ -27,6 +29,8 @@ In every folder you have: |
|
|
file, containing all the words found on the page using the OCR. Some PDF contains crap instead
|
|
|
of the real text, so running the OCR on them can sometimes be useful.
|
|
|
|
|
|
Starting from Paperwork 2.0, content of both types of documents can be mixed. File `paper.<x>.jpg` and `paper.<x>.words` are always read before any PDF file.
|
|
|
|
|
|
Here is an example a work directory organisation:
|
|
|
|
|
|
$ find ~/papers
|
... | ... | @@ -47,6 +51,7 @@ Here is an example a work directory organisation: |
|
|
/home/jflesch/papers/20110726_0000_01/paper.1.thumb.jpg
|
|
|
/home/jflesch/papers/20110726_0000_01/paper.1.words
|
|
|
/home/jflesch/papers/20110726_0000_01/paper.2.jpg
|
|
|
/home/jflesch/papers/20110726_0000_01/paper.2.edited.jpg
|
|
|
/home/jflesch/papers/20110726_0000_01/paper.2.thumb.jpg
|
|
|
/home/jflesch/papers/20110726_0000_01/paper.2.words
|
|
|
/home/jflesch/papers/20110726_0000_01/extra.txt
|
... | ... | @@ -77,4 +82,5 @@ logement,#f6b6ffff0000 |
|
|
```
|
|
|
|
|
|
It's always [label],[color]. For a same label, the color should always be the same.
|
|
|
Commas are cannot be used in label names.
|
|
|
|