This is the mail archive of the
mailing list for the GCC project.
Re: Universal Character Names, v2
- From: Geoff Keating <geoffk at geoffk dot org>
- To: "Martin v. Löwis" <martin at v dot loewis dot de>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: 29 Nov 2002 23:30:54 -0800
- Subject: Re: Universal Character Names, v2
- References: <200211282334.gASNYdTA004058@mira.informatik.hu-berlin.de>
"Martin v. =?iso-8859-15?q?L=F6wis?=" <email@example.com> writes:
> - Support UCNs in numbers. In the internal represantation, such
> a number still has the UCN in it, i.e. no conversion to UTF-8
> takes place. Such numbers will only be valid if they are pasted
> with an identifier.
Is this intended to be an extension to C99? I think it is; C99
doesn't support things that start with digits and then contain
non-digits (as specified by C99) in them. I believe that support for
more kinds of digit was explicitly considered and rejected by the C
standards committee, on the grounds that (a) it provides no useful
functionality, and (b) it makes it harder to process C source files
without having a full C tokenizer, because now even recognizing the
start of a number requires full UCN processing.
> - I have not decided to deviate from the C and C++ standards for
> character tests. Reviewers commented that they dislike the approach
> taken by the standards committees, and that the relevant Unicode
> specification should be taken into account instead. I disagree, as I
> consider the approach of giving explicit lists quite reasonable.
> More importantly, I think that standards conformance should be
> valued quite highly unless specific user demands require to
> ignore or extend the standards; this is not the case in the
> specific issue.
I support this approach. The C standards committee considered the
matter, and decided that it was better to have a fixed, limited, set
of characters that could be allowed in identifiers, rather than a
language definition that varies as ISO10646 is updated, causing some
implementations to be incorrect simply because they didn't have the
latest ISO10646 code tables.
- Geoffrey Keating <firstname.lastname@example.org>