Skip to content
  • Aleksander Morgado's avatar
    libtracker-data: new 'tracker:unaccent' method · b3fb86ea
    Aleksander Morgado authored
    https://bugzilla.gnome.org/show_bug.cgi?id=722254
    
    This method allows removing combining diacritical marks (accents) from strings
    used in SPARQL queries. It expects a single argument, the string to be
    unaccented.
    
    Note that the output string will also be NFKD-normalized.
    
    Example:
    
    1) First, insert a new element which has accents in the nie:title. In the
    example we insert the word 'école' which in UTF-8 NFC looks like
    "0xC3 0xA9 0x63 0x6F 0x6C 0x65":
    
        $ tracker-sparql -u -q "
            INSERT { <abc> a         nie:InformationElement .
                     <abc> nie:title 'école' }"
    
    2) Second, get hexdump of querying nie:title, we should get the original string
    in UTF-8 and NFC normalization:
    
        $ tracker-sparql -q "
            SELECT ?title
            WHERE { <abc> nie:title ?title }" | hexdump
        0000000 6552 7573 746c 3a73 200a c320 63a9 6c6f
        0000010 0a65 000a
        0000013
    
    Or, without the hexdump...
    
        $ tracker-sparql -q "
            SELECT ?title
            WHERE { <abc> nie:title ?title }"
        Results:
          école
    
    3) Last, apply the unaccenting method. The expected string should look like
    "0×65 0×63 0x6F 0x6C 0×65" (i.e. without the combining diacritical mark):
    
        $ tracker-sparql -q "
            SELECT tracker:unaccent(?title)
            WHERE { <abc> nie:title ?title }" | hexdump
        0000000 6552 7573 746c 3a73 200a 6520 6f63 656c
        0000010 0a0a
        0000012
    
    Or, without the hexdump...
    
        $ tracker-sparql -q "
            SELECT tracker:unaccent(?title)
            WHERE { <abc> nie:title ?title }"
        Results:
          ecole
    b3fb86ea