wrong word boundary detection arround MidLetter characters defined by Unicode UAX TR29
Submitted by jmo..@..la.org
Link to original bug (#700103)
Description
1.- Open gedit or GIMP or any other GNOME aplication with edit text field. 2.- Type any string with "l·l", like "goril·les" or "paral·leles" 3.- Double-click at word
Expected result: the whole word should be selected ("goril·les" or "paral·leles")
Obtained: the word is segmented arround "·" char. So only the first or second part of the word is selected ("goril" or "les", if you typed "goril·les").
Tested in Windows (GIMP) and GNOME (gedit and GIMP).
AFAIK, Pango follows [1] Unicode Text Segmenation algorithm [2].
According to [2], "·" char U+00B7 is a MidLetter, and rules WB6 and WB7 forbide word-breaking here.
So... I don't know where is the problem. But there is one somewhere.
This bug is annoying, because not only affects when user double-clicks on text, also affects when using spell-checker. For instance, "goril·les" is splitted, 2 words are passed to spell-chechker: "goril" and "les" and no good spell-checking ara possible for Catalan. See related bug 610106
[2] http://www.unicode.org/reports/tr29/