This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Implementing Universal Character Names in identifiers
- From: kaih at khms dot westfalen dot de (Kai Henningsen)
- To: gcc at gcc dot gnu dot org
- Date: 04 Nov 2002 08:51:00 +0200
- Subject: Re: Implementing Universal Character Names in identifiers
- Comment: Unsolicited commercial mail will incur an US$100 handling fee per received mail.
- Organization: Organisation? Me?! Are you kidding?
- References: <20021028183910.GC24090@codesourcery.com> <Pine.LNX.4.33.0210281844370.4320-100000@kern.srcf.societies.cam.ac.uk>
jsm28@cam.ac.uk (Joseph S. Myers) wrote on 28.10.02 in <Pine.LNX.4.33.0210281844370.4320-100000@kern.srcf.societies.cam.ac.uk>:
> On Mon, 28 Oct 2002, Zack Weinberg wrote:
>
> > What you wrote in response to this is interesting but doesn't address
> > the issue of Unicode normalization of identifiers. It sounds more
> > like an extended discussion of the previous point. I'm talking about
> > the process described in UAX 15 (http://www.unicode.org/unicode/reports/tr
> > 15/) and in particular annex 7 of that document ("Programming Language
> > Identifiers").
>
> I don't think there's anything in the language standards to permit
> normalization to NFC as described there.
I don't think even considering to do without can be justified. You do
Unicode, you normalize. Anything else is insane.
> not for UCNs. And do we really want to build in the large character
> tables required for normalization?)
IIRC, if you implement them as doubly-indirect, tables of *all* the bit-
attributes of the current Unicode version (over the complete 20.x bit
range) take up something like 20 KB or so. (The trick is that you first
unify stretches of equal bit sequences - mainly all-ones and all-zeroes -
and then unify stretches of equal pointers [or indices] to those.)
That's not really big for something like gcc.
MfG Kai