[Bug c/67224] UTF-8 support for identifier names in GCC
joseph at codesourcery dot com
gcc-bugzilla@gcc.gnu.org
Mon Aug 17 20:18:00 GMT 2015
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224
--- Comment #11 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
Sorry, glibc iconv (not libiconv) doesn't handle "C99". So your patch
would not work on any GNU host in normal configurations of GCC (libiconv
is a completely separate package and is only likely to be used on non-GNU
hosts such as Windows, on GNU hosts iconv from glibc is normally used
although it's possible to use libiconv there).
You need to test cases such as that if a macro is defined twice, once with
a UCN in its expansion and once with the equivalent character written in
UTF-8, the difference in the expansion is diagnosed (whichever of all the
valid UCNs for that character is the one used). And that the original
spelling appears on the right hand side of a definition output with -dD.
And that if (in C but not, properly, C++) a string contains a backslash
followed by an extended character, this is properly diagnosed as an
invalid escape sequence rather than being treated as \\u<something> or
\\U<something>. See the tests in my spelling preservation patch
<https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00548.html>. (Stringizing
isn't necessarily an issue here because of the special C rules about
stringizing UCNs together with the C++ rule about converting to UCNs in
phase 1 - the effect is that for C it's always OK to stringize as the
extended character, though you can't stringize as a UCN if the extended
character was originally written, while for C++ you have to stringize as a
UCN.) And then you need tests of C++ programs with extended characters
inside raw strings (like c-c++-common/raw-string-*.c, but none of those
cover extended characters at present). And the patch needs to add all
these tests to the testsuite.
More information about the Gcc-bugs
mailing list