Implementing Universal Character Names in identifiers
Neil Booth
neil@daikokuya.co.uk
Thu Nov 7 00:09:00 GMT 2002
Martin v. L?wis wrote:-
> This patch implements UCNs in cpplib. It does so by converting the
> UCN to UTF-8, putting the UTF-8 bytes into the internal
> representation of the identifier.
>
> The back-ends will transparently output the UTF-8 identifiers into the
> assembler file. If GNU as is used (or any other assembler supporting
> non-ASCII identifiers), these UTF-8 strings will be copied transparently
> into the object file. If the assembler does not support UTF-8, it
> will produce a diagnostic.
>
> As a result of this strategy, UCNs are now allowed in all places
> mandated by the relevant standards, i.e. both in C99 and C++, and in
> all identifiers, including macro names.
>
> Regards,
> Martin
>
> 2002-10-27 Martin v. L?wis <loewis@informatik.hu-berlin.de>
>
> * c-lex.c (is_extended_char, utf8_extend_token): Remove.
> * cpplex.c (identifier_ucs_p, utf8_extend_token,
> utf8_to_char): New functions.
> (parse_slow): Add utf8 parameter. Parse UCS names.
> (parse_identifier, parse_number): Adjust.
> (_cpp_lex_direct): Parse UCS names.
> (cpp_output_token): Print UCS names.
> * cpplib.h (NODE_UTF8): New flag.
It would be nice if you could handle escaped newline issues in
the UCS; I don't think your patch does that. I think it's a bit
painful, and is one of the reasons I'd not added support for them
yet. It would be easier if there was a prescan of phases 1 and 2
(a logical line at a time) of translation, which Zack and I
keep wondering whether to do or not.
Also, as a QOI issue I'd like token pasting to work for UCS's,
though the standard does not require it. Does your patch handle
that?
Thanks,
Neil.
More information about the Gcc-patches
mailing list