Handle spaces and hyphenation when search pdf
Submitted by Sven
This is similar to Bug 598759, but I would like to skip something more complex.
First of all, I have no idea what happens in the background. For example, I have no idea of how evince finds out that there is some sort of space between two letters or words. I could imagine, that PDF files don't contain any information like "there's space here" and that some (bad) heuristic is at work.
- Searching for a single word:
When searching for a single word, evince often fails to find it. I don't know why, but copy/pasting the word from the PDF reveiled, that evince things that there are some spaces between the letter. Searching for the same word and adding some spaces here and there fixes makes evince find it. But you can imagine, that I don't want to guess the locations of spaces in order to find a word.
- Searching across lines:
Evince could ignore hyphens at the end of a line, if a word has been hyphenated. Of course, it could be a composite word like "in-between" that has been split into "in-" and "between". So just stripping all hyphens at the end of a line won't do.
- Searching multiple words:
When I enter "the king is dead", I guess what evince does is to search for that string in the PDF. If it is spread among multiple lines, evince won't find it. If the PDF reports that two spaces are between "the" and "king", then evince won't find it.
Well, Adobe Reader implements all of the above and probably much more.