This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Query on UTF-32 encodings for letters
On Sat, 15 Jan 2005, Robert Dewar wrote:
> Well I really don't understand the implementation of iswalpha. For
> example, it yields false for "FEMININE ORDINAL INDICATOR" (16#AA#)
> even though the definition in the database is:
glibc's iswalpha works for me, provided the program has called setlocale
before iswalpha and is running under a suitable locale whose definition
copies the i18n file's LC_CTYPE data (e.g. en_GB.UTF-8, not C / POSIX).
> At first, it looked to me like it was just testing LETTER in the
> name of the symbol, but that is disproved by:
That is one thing gen-unicode-ctype.c looks at in addition to the
character class. To quote from CVS glibc, localedata/gen-unicode-ctype.c,
return (unicode_attributes[ch].name != NULL
&& ((unicode_attributes[ch].category[0] == 'L'
/* Theppitak Karoonboonyanan <thep@links.nectec.or.th> says
<U0E2F>, <U0E46> should belong to is_punct. */
&& (ch != 0x0E2F) && (ch != 0x0E46))
/* Theppitak Karoonboonyanan <thep@links.nectec.or.th> says
<U0E31>, <U0E34>..<U0E3A>, <U0E47>..<U0E4E> are is_alpha. */
|| (ch == 0x0E31)
|| (ch >= 0x0E34 && ch <= 0x0E3A)
|| (ch >= 0x0E47 && ch <= 0x0E4E)
/* Avoid warning for <U0345>. */
|| (ch == 0x0345)
/* Avoid warnings for <U2160>..<U217F>. */
|| (unicode_attributes[ch].category[0] == 'N'
&& unicode_attributes[ch].category[1] == 'l')
/* Avoid warnings for <U24B6>..<U24E9>. */
|| (unicode_attributes[ch].category[0] == 'S'
&& unicode_attributes[ch].category[1] == 'o'
&& strstr (unicode_attributes[ch].name, " LETTER ")
!= NULL)
/* Consider all the non-ASCII digits as alphabetic.
ISO C 99 forbids us to have them in category "digit",
but we want iswalnum to return true on them. */
|| (unicode_attributes[ch].category[0] == 'N'
&& unicode_attributes[ch].category[1] == 'd'
&& !(ch >= 0x0030 && ch <= 0x0039))));
If what you require is a specific definition in terms of (maybe a specific
version of) the Unicode Character database rather than something
locale-dependent and so system-dependent, then indeed the system library
may be unsuitable.
--
Joseph S. Myers http://www.srcf.ucam.org/~jsm28/gcc/
jsm@polyomino.org.uk (personal mail)
joseph@codesourcery.com (CodeSourcery mail)
jsm28@gcc.gnu.org (Bugzilla assignments and CCs)