Implementing Universal Character Names in identifiers

Neil Booth neil@daikokuya.co.uk
Thu Nov 7 11:40:00 GMT 2002


Martin v. L?wis wrote:-

> I think UCNs are rightfully different from nearly everything else;
> they are quite similar to multi-byte characters. If you have an
> escaped newline in the middle of a multi-byte character, you would not
> expect concatenation to create a new multi-byte character, either,
> would you?

It's not analogous.  The idiom is that an escaped newline between
characters in the source character set are invisible.  You are
proposing breaking that idiom.  Multibyte chars are single chars
in the source character set, and so your counterexample does not apply.
The UCN stuff is really a phase 3 thing; escaped newlines are phase 2.

I really want this implemented in whatever patch goes in.  It's not
hard to do; instead of reading chars directly through a pointer,
call get_effective_char() instead, like the other parts of cpplex.c do.
It handles skipping the escaped newlines, if any.

> I cannot see any important use cases for such a
> feature. Implementations are allowed to reject this case, and it
> simplifies the implementation to reject it, so I can see really no
> reason to make life more complicated than necessary. Producing an
> error now still gives the opportunity to provide an extension later.

EDG accepts escaped newlines in UCNs; I've just tried it, so it's not
without precedent.

> > A backslash is a token; so is u00c0.  Your example is indeed an
> > error, but was not what I had in mind.  I suspect pasting just works,
> > anyway.
> 
> Can you please give an example for what you had in mind?

#define f(x, y) x ## y
f(\, uc00c0)

Which reminds me, the anti-accidental-paste code might need an extra
line or two.

Neil.



More information about the Gcc-patches mailing list