g_unichar_iszerowidth does not handle Prepended_Concatenation_Mark correctly

Submitted by Mike Frysinger

Description

glib currently marks all Cf (Format Character) as zero width, but this ignores Prepended_Concatenation_Mark codepoints. i guess gen-unicode-tables.pl should be consulting PropList.txt from the Unicode releases.

specifically these should all return false w/g_unichar_iszerowidth: 0600..0605 ; Prepended_Concatenation_Mark # Cf ARABIC NUMBER SIGN..ARABIC NUMBER MARK ABOVE 06DD ; Prepended_Concatenation_Mark # Cf ARABIC END OF AYAH 070F ; Prepended_Concatenation_Mark # Cf SYRIAC ABBREVIATION MARK 08E2 ; Prepended_Concatenation_Mark # Cf ARABIC DISPUTED END OF AYAH 110BD ; Prepended_Concatenation_Mark # Cf KAITHI NUMBER SIGN

Unicode 10.0.0 chapter 9 section 2 page 377-378 [1] states: Signs Spanning Numbers. Several other special signs are written in association with numbers in the Arabic script. All of these signs can span multiple-digit numbers, rather than just a single digit. They are not formally considered combining marks in the sense used by the Unicode Standard, although they clearly interact graphically with their associated sequence of digits. In the text representation they precede the sequence of digits that they span, rather than follow a base character, as would be the case for a combining mark. Their General_Category value is Cf (format character). Unlike most other format characters, however, they should be rendered with a visible glyph, even in circumstances where no suitable digit or sequence of digits follows them in logical order. The characters have the Bidi_Class value of Arabic_Number to make them appear in the same run as the numbers following them.

A few similar signs spanning numbers or letters are associated with scripts other than Arabic. See the discussion of U+070F syriac abbreviation mark in Section 9.3, Syriac, and the discussion of U+110BD kaithi number sign in Section 15.2, Kaithi. All of these prefixed format controls, including the non-Arabic ones, are given the property value Prepended_Concatenation_Mark=True, to identify them as a class. They also have special behavior in text segmentation. (See Unicode Standard Annex #29 (closed), “Unicode Text Segmentation.”)

[1] http://unicode.org/versions/Unicode10.0.0/ch09.pdf