Skip to content
  • Michael Gratton's avatar
    ImapDb.Database: Register new ICU-based tokeniser for FTS · 7e381982
    Michael Gratton authored and Michael Gratton's avatar Michael Gratton committed
    The SQLite tokeniser does not deal with scripts that do not use spaces
    for word breaking (CJK, Thai, etc), thus searching in those languages
    does not work well.
    
    This adds a custom SQLite tokeniser based on ICU that breaks words for
    all languages supported by that library, and uses NFKC_Casefold
    normalisation to handle normalisation, case folding, and dropping of
    ignorable characters.
    
    Fixes #121
    7e381982