This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Implementing Universal Character Names in identifiers


jsm28@cam.ac.uk (Joseph S. Myers)  wrote on 28.10.02 in <Pine.LNX.4.33.0210281844370.4320-100000@kern.srcf.societies.cam.ac.uk>:

> On Mon, 28 Oct 2002, Zack Weinberg wrote:
>
> > What you wrote in response to this is interesting but doesn't address
> > the issue of Unicode normalization of identifiers.  It sounds more
> > like an extended discussion of the previous point.  I'm talking about
> > the process described in UAX 15 (http://www.unicode.org/unicode/reports/tr
> > 15/) and in particular annex 7 of that document ("Programming Language
> > Identifiers").
>
> I don't think there's anything in the language standards to permit
> normalization to NFC as described there.

I don't think even considering to do without can be justified. You do  
Unicode, you normalize. Anything else is insane.

> not for UCNs.  And do we really want to build in the large character
> tables required for normalization?)

IIRC, if you implement them as doubly-indirect, tables of *all* the bit- 
attributes of the current Unicode version (over the complete 20.x bit  
range) take up something like 20 KB or so. (The trick is that you first  
unify stretches of equal bit sequences - mainly all-ones and all-zeroes -  
and then unify stretches of equal pointers [or indices] to those.)

That's not really big for something like gcc.

MfG Kai


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]