Jerome Flesch · 94ebc9ae
--- a/Work-directory-organization.md
+++ b/Work-directory-organization.md
 workdir|rootdir = ~/papers

-### Global organisation
+# Global organisation

 In the work directory, you have folders, one per document.

@@ -8,30 +8,6 @@ The folder names are (usually) the scan/import date of the documents:
 `YYYYMMDD\_hhmm\_ss[\_<idx>]`. The suffix 'idx' is optional and is just
 a number added in case of name collision. `idx` can be a string too.

-In every folder you have:
-
-* For image documents:
-  * `paper.<X>.jpg`: A page in JPG format (X starts at 1). It's the original page (as scanned or imported).
-  * `paper.<X>.edited.jpg` (optional): The page after post-processing and editing. (Paperwork >= 2.0 only)
-  * `paper.<X>.words` (optional): A
-    [hOCR](https://docs.google.com/document/d/1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0/preview)
-    file, containing all the words found on the page using the OCR (optional, but required for indexing ; can be regenerated with the options "Redo OCR (...)").
-  * `paper.<X>.thumb.jpg` (optional, generated and updated automatically): A thumbnail version of the page (faster to load).
-    Starting with Paperwork 2.0, only paper.1.thumb.jpg is used.
-  * `labels` (optional): a text file containing the labels applied on this document (text + label color)
-  * `extra.txt` (optional): extra keywords added by the user
-* For PDF documents:
-  * `doc.pdf`: the document
-  * `labels` (optional): a text file containing the labels applied on this document
-  * `extra.txt` (optional): extra keywords added by the user
-  * `paper.<X>.words` (optional): A
-    [hOCR](https://docs.google.com/document/d/1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0/preview)
-    file, containing all the words found on the page using the OCR. Some PDF contains crap instead
-    of the real text, so running the OCR on them can sometimes be useful.
-  * `paper.<X>.edited.jpg` (optional): The page after editing. (Paperwork >= 2.0 only)
-  * `page_map.csv` (optional): Created if the user move the page inside the PDF file. doc.pdf is not actually modified,
-    only this mapping file. Pages are reordered on-the-fly when the document is displayed or exported. (Paperwork >= 2.0 only)
-
 Here is an example a work directory organization:

 ```sh
@@ -65,7 +41,36 @@ $ find ~/papers
 /home/jflesch/papers/20130106_1309_44/extra.txt
 ```

-### hOCR files
+# Document files
+
+## Image documents
+
+  * `paper.<X>.jpg`: A page in JPG format (X starts at 1). It's the original page (as scanned or imported).
+  * `paper.<X>.edited.jpg` (optional): The page after post-processing and editing. (Paperwork >= 2.0 only)
+  * `paper.<X>.words` (optional): A
+    [hOCR](https://docs.google.com/document/d/1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0/preview)
+    file, containing all the words found on the page using the OCR (optional, but required for indexing ; can be regenerated with the options "Redo OCR (...)").
+  * `paper.<X>.thumb.jpg` (optional, generated and updated automatically): A thumbnail version of the page (faster to load).
+    Starting with Paperwork 2.0, only paper.1.thumb.jpg is used.
+  * `labels` (optional): a text file containing the labels applied on this document (text + label color)
+  * `extra.txt` (optional): extra keywords added by the user
+
+## PDF documents
+
+  * `doc.pdf`: the document
+  * `labels` (optional): a text file containing the labels applied on this document
+  * `extra.txt` (optional): extra keywords added by the user
+  * `paper.<X>.words` (optional): A
+    [hOCR](https://docs.google.com/document/d/1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0/preview)
+    file, containing all the words found on the page using the OCR. Some PDF contains crap instead
+    of the real text, so running the OCR on them can sometimes be useful.
+  * `paper.<X>.edited.jpg` (optional): The page after editing. (Paperwork >= 2.0 only)
+  * `page_map.csv` (optional): Created if the user move the page inside the PDF file. doc.pdf is not actually modified,
+    only this mapping file. Pages are reordered on-the-fly when the document is displayed or exported. (Paperwork >= 2.0 only)
+
+# File details
+
+## hOCR files

 With Tesseract, the hOCR file can be obtained with following command:

@@ -79,7 +84,7 @@ For example:
 tesseract paper.1.jpg paper.1 -l fra hocr && mv paper.1.html paper.1.words
 ```

-### Label files
+## Label files

 Here is an example of content of a label file: