This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug c/67224] UTF-8 support for identifier names in GCC


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224

--- Comment #20 from Eric <ejolson at unr dot edu> ---
I've been looking at the code in lex_identifier as well as what goes on in
forms_identifier_p and so forth.  As some point each identifier needs to be
stored in the symbol table using ht_lookup_with_hash.  Proper functioning
requires that UTF-8 and UCN representations of the same unicode characters are
treated as the same symbol.  Thus, there needs to be some point at which the
identifiers are regularized to be either all UTF-8 or all UCN escaped ASCII. 
As gcc is working with UCNs right now, the obvious implementation allocates
temporary memory to hold the UCN escaped ASCII version of an UTF-8 identifier
and then frees it again after calling ht_lookup.  Any comments would be
appreciated.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]