This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug c/67224] UTF-8 support for identifier names in GCC

From: "ejolson at unr dot edu" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Thu, 20 Aug 2015 22:14:46 +0000
Subject: [Bug c/67224] UTF-8 support for identifier names in GCC
Auto-submitted: auto-generated
References: <bug-67224-4 at http dot gcc dot gnu dot org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224

--- Comment #20 from Eric <ejolson at unr dot edu> ---
I've been looking at the code in lex_identifier as well as what goes on in
forms_identifier_p and so forth.  As some point each identifier needs to be
stored in the symbol table using ht_lookup_with_hash.  Proper functioning
requires that UTF-8 and UCN representations of the same unicode characters are
treated as the same symbol.  Thus, there needs to be some point at which the
identifiers are regularized to be either all UTF-8 or all UCN escaped ASCII. 
As gcc is working with UCNs right now, the obvious implementation allocates
temporary memory to hold the UCN escaped ASCII version of an UTF-8 identifier
and then frees it again after calling ht_lookup.  Any comments would be
appreciated.

References:
- [Bug c/67224] New: UTF-8 support for identifier names in GCC
  - From: ejolson at unr dot edu

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]