Jerome Flesch · ed88881e
--- a/Work-directory-organization.md
+++ b/Work-directory-organization.md
-workdir|rootdir = ~/papers
-
-In the work directory, you have folders, one per document.
-
-The folder names are (usually) the scan/import date of the document:
-YYYYMMDD\_hhmm\_ss[\_&lt;idx&gt;]. The suffix 'idx' is optional and is just
-a number added in case of name collision.
-
-In every folder you have:
-
-* For image documents:
-  * paper.&lt;X&gt;.jpg : A page in JPG format (X starts at 1)
-  * paper.&lt;X&gt;.words : A
-    [hOCR](https://docs.google.com/document/d/1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0/preview)
-	file, containing all the words found on the page using the OCR.
-  * paper.&lt;X&gt;.thumb.jpg (optional) : A thumbnail version of the page (faster to load)
-  * labels (optional) : a text file containing the labels applied on this document
-  * extra.txt (optional) : extra keywords added by the user
-* For PDF documents:
-  * doc.pdf : the document
-  * labels (optional) : a text file containing the labels applied on this document
-  * extra.txt (optional) : extra keywords added by the user
-  * paper.&lt;X&gt;.words (optional) : A
-    [hOCR](https://docs.google.com/document/d/1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0/preview)
-	file, containing all the words found on the page using the OCR. Some PDF contains crap instead
-	of the real text, so running the OCR on them can sometimes be useful.
-
-With Tesseract, the hOCR file can be obtained with following command:
-
-	tesseract paper.<X>.jpg paper.<X> -l <lang> hocr && mv paper.<X>.html paper.<X>.words
-
-For example:
-
-	tesseract paper.1.jpg paper.1 -l fra hocr && mv paper.1.html paper.1.words
-
-Here is an example a work directory organisation:
-
-	$ find ~/papers
-	/home/jflesch/papers
-	/home/jflesch/papers/20130505_1518_00
-	/home/jflesch/papers/20130505_1518_00/paper.1.jpg
-	/home/jflesch/papers/20130505_1518_00/paper.1.thumb.jpg
-	/home/jflesch/papers/20130505_1518_00/paper.1.words
-	/home/jflesch/papers/20130505_1518_00/paper.2.jpg
-	/home/jflesch/papers/20130505_1518_00/paper.2.thumb.jpg
-	/home/jflesch/papers/20130505_1518_00/paper.2.words
-	/home/jflesch/papers/20130505_1518_00/paper.3.jpg
-	/home/jflesch/papers/20130505_1518_00/paper.3.thumb.jpg
-	/home/jflesch/papers/20130505_1518_00/paper.3.words
-	/home/jflesch/papers/20130505_1518_00/labels
-	/home/jflesch/papers/20110726_0000_01
-	/home/jflesch/papers/20110726_0000_01/paper.1.jpg
-	/home/jflesch/papers/20110726_0000_01/paper.1.thumb.jpg
-	/home/jflesch/papers/20110726_0000_01/paper.1.words
-	/home/jflesch/papers/20110726_0000_01/paper.2.jpg
-	/home/jflesch/papers/20110726_0000_01/paper.2.thumb.jpg
-	/home/jflesch/papers/20110726_0000_01/paper.2.words
-	/home/jflesch/papers/20110726_0000_01/extra.txt
-	/home/jflesch/papers/20130106_1309_44
-	/home/jflesch/papers/20130106_1309_44/doc.pdf
-	/home/jflesch/papers/20130106_1309_44/paper.1.words
-	/home/jflesch/papers/20130106_1309_44/paper.2.words
-	/home/jflesch/papers/20130106_1309_44/labels
-	/home/jflesch/papers/20130106_1309_44/extra.txt
+workdir|rootdir = ~/papers
+
+In the work directory, you have folders, one per document.
+
+The folder names are (usually) the scan/import date of the document:
+YYYYMMDD\_hhmm\_ss[\_&lt;idx&gt;]. The suffix 'idx' is optional and is just
+a number added in case of name collision.
+
+In every folder you have:
+
+* For image documents:
+  * paper.&lt;X&gt;.jpg : A page in JPG format (X starts at 1)
+  * paper.&lt;X&gt;.words : A
+    [hOCR](https://docs.google.com/document/d/1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0/preview)
+	file, containing all the words found on the page using the OCR.
+  * paper.&lt;X&gt;.thumb.jpg (optional) : A thumbnail version of the page (faster to load)
+  * labels (optional) : a text file containing the labels applied on this document
+  * extra.txt (optional) : extra keywords added by the user
+* For PDF documents:
+  * doc.pdf : the document
+  * labels (optional) : a text file containing the labels applied on this document
+  * extra.txt (optional) : extra keywords added by the user
+  * paper.&lt;X&gt;.words (optional) : A
+    [hOCR](https://docs.google.com/document/d/1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0/preview)
+	file, containing all the words found on the page using the OCR. Some PDF contains crap instead
+	of the real text, so running the OCR on them can sometimes be useful.
+
+With Tesseract, the hOCR file can be obtained with following command:
+
+	tesseract paper.<X>.jpg paper.<X> -l <lang> hocr && mv paper.<X>.html paper.<X>.words
+
+For example:
+
+	tesseract paper.1.jpg paper.1 -l fra hocr && mv paper.1.html paper.1.words
+
+Here is an example a work directory organisation:
+
+	$ find ~/papers
+	/home/jflesch/papers
+	/home/jflesch/papers/20130505_1518_00
+	/home/jflesch/papers/20130505_1518_00/paper.1.jpg
+	/home/jflesch/papers/20130505_1518_00/paper.1.thumb.jpg
+	/home/jflesch/papers/20130505_1518_00/paper.1.words
+	/home/jflesch/papers/20130505_1518_00/paper.2.jpg
+	/home/jflesch/papers/20130505_1518_00/paper.2.thumb.jpg
+	/home/jflesch/papers/20130505_1518_00/paper.2.words
+	/home/jflesch/papers/20130505_1518_00/paper.3.jpg
+	/home/jflesch/papers/20130505_1518_00/paper.3.thumb.jpg
+	/home/jflesch/papers/20130505_1518_00/paper.3.words
+	/home/jflesch/papers/20130505_1518_00/labels
+	/home/jflesch/papers/20110726_0000_01
+	/home/jflesch/papers/20110726_0000_01/paper.1.jpg
+	/home/jflesch/papers/20110726_0000_01/paper.1.thumb.jpg
+	/home/jflesch/papers/20110726_0000_01/paper.1.words
+	/home/jflesch/papers/20110726_0000_01/paper.2.jpg
+	/home/jflesch/papers/20110726_0000_01/paper.2.thumb.jpg
+	/home/jflesch/papers/20110726_0000_01/paper.2.words
+	/home/jflesch/papers/20110726_0000_01/extra.txt
+	/home/jflesch/papers/20130106_1309_44
+	/home/jflesch/papers/20130106_1309_44/doc.pdf
+	/home/jflesch/papers/20130106_1309_44/paper.1.words
+	/home/jflesch/papers/20130106_1309_44/paper.2.words
+	/home/jflesch/papers/20130106_1309_44/labels
+	/home/jflesch/papers/20130106_1309_44/extra.txt
+
+Here is an example of content of a label file:
+
+```
+facture,#0000b1588c61
+logement,#f6b6ffff0000
+```
+
+It's always [label],[color]. For a same label, the color should always be the same.
\ No newline at end of file