Word break not supported for Chinese ('zh') and/or Unihan in general
I have been looking into fixing Geary bug geary#121 (closed), where a search query entered in Chinese such as 男子去
is not properly tokenised into two words (男子
and 去
), whereas a search in English such as test search
would be.
Using Pango seemed to be a good solution for this since I can use pango_get_log_attrs()
to analyse the search text, however after implementing this, it seems that word-break analysis for Chinese (using zh
as the language) doesn't work, whereas it does for both English and Thai.
So this is a request to have this implemented for Chinese, but perhaps the Unihan database could be used as a basis for implementing this for the Han unification languages in general?