This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug c/67224] UTF-8 support for identifier names in GCC
- From: "ejolson at unr dot edu" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 20 Aug 2015 22:14:46 +0000
- Subject: [Bug c/67224] UTF-8 support for identifier names in GCC
- Auto-submitted: auto-generated
- References: <bug-67224-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224
--- Comment #20 from Eric <ejolson at unr dot edu> ---
I've been looking at the code in lex_identifier as well as what goes on in
forms_identifier_p and so forth. As some point each identifier needs to be
stored in the symbol table using ht_lookup_with_hash. Proper functioning
requires that UTF-8 and UCN representations of the same unicode characters are
treated as the same symbol. Thus, there needs to be some point at which the
identifiers are regularized to be either all UTF-8 or all UCN escaped ASCII.
As gcc is working with UCNs right now, the obvious implementation allocates
temporary memory to hold the UCN escaped ASCII version of an UTF-8 identifier
and then frees it again after calling ht_lookup. Any comments would be
appreciated.