backspacing doesn't work properly for Arabic
Submitted by Roozbeh Pournader
Assigned to Behdad Esfahbod
Link to original bug (#350132)
Description
Backspacing doesn't work as expected for the Arabic script. Currently, the functions gtk_entry_backspace and gtk_text_buffer_backspace normalize the string to NFD and then remove the last character from the string. This is not intuitive, because the normalization classes of Arabic NSMs are random, and that the Hamza forms are usually considered a single letter by readers of the languages written in the Arabic script.
The current behavior is specially bad when a Hamza form is involved, or when two NSMs appear on one lettter (when usually one of them is a Shadda).
Examples that result in non-intuitive behavior (NFD, intuitive):
0646 064E 0651 (NOON FATHA SHADDA, NOON SHADDA FATHA): Backspace removes Shadda, while natives think about Shadda appearing before Fatha in this case, so expecting Fatha to be removed.
0627 0653 (ALEF MADDA, ALEF-MADDA): Backspace removes Madda, while natives think of Alef-Madda as a single unit (0622).
064A 0654 (YEH HAMZA, YEH-HAMZA): This is the same as the common letter YEH-HAMZA (0626). After pressing the backspace, only the HAMZA is removed, an Arabic Yeh then remains, which is unacceptable in languages like Persian and Urdu which use 0626 but not 064A.
064A 064E 0654 (YEH FATHA HAMZA, YEH-HAMZA FATHA): This may be among the worst case scenarios for Persian. A user first presses the key for Yeh-Hamza and then for Fatha, but when she backspaces, an Arabic Yeh (not used in Persian) remains with a Fatha over it.