G_REGEX_RAW isn't truly "raw" in the PCRE sense
Submitted by Hadriel Kaplan
Link to original bug (#725072)
Description
It's debatable if this is a bug vs. enhancement, but I think of it as a bug from a "user of the glib library" perspective.
Currently, glib always sets the PCRE_UCP (unicode properties) compile flag in g_regex_new(), regardless of G_REGEX_RAW being set or not. That has side-effects when searching strings with chars > 127, because pattern assertions like '\w' and '\b', as well as POSIX character classes like '[:alpha:]' and '[:lower:]', will match some of the characters in the > 127 range. (in fact, they match a lot of such characters)
I think most users would expect G_REGEX_RAW to actually mean "normal PCRE mode", which PCRE_UCP is not. (it's also slower, fwiw)
Since it's been this way forever, I'm not recommending changing G_REGEX_RAW behavior. Instead, I suggest a new GRegexCompileFlags enum value of 'G_REGEX_NO_UCP' be added, which if set will not set the PCRE_UCP flag in g_regex_new().
Version: 2.38.x