This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: thoughts on martin's proposed patch for GCC and UTF-8
- To: martin at mira dot isdn dot cs dot tu-berlin dot de
- Subject: Re: thoughts on martin's proposed patch for GCC and UTF-8
- From: Ian Lance Taylor <ian at cygnus dot com>
- Date: Thu, 10 Dec 1998 10:57:10 -0500
- CC: eggert at twinsun dot com, brolley at cygnus dot com, gcc2 at gnu dot org, egcs at cygnus dot com
Date: Thu, 10 Dec 1998 08:12:20 +0100
From: Martin von Loewis <martin@mira.isdn.cs.tu-berlin.de>
> If the object-code standard is to use UTF-8 names, then I suppose the
> assembler can convert to UTF-8.
No. The gas people made it very clear that they consider character sets
somebody else's problems (i.e. ours).
That is too strong. For hand coded assembler, I can see that there
may be a need for gas to do some character set conversions. Also, if
it is ever possible for an identifier name to include a byte value
which gas will consider to be an operator, then it is clearly
necessary for gas to permit quoting that byte value, and perhaps to do
more general character set conversions.
In general, though, if gcc needs to understands character set issues,
which appears to be the case, and if it can emit identifiers in a
manner which will not confuse gas, then I think it is reasonable for
gcc to emit identifiers as uninterpreted byte sequences, and for gas
to simply pass those identifiers straight through into the object
file.
I can't claim to understand many of the issues here, though.
Several people have mentioned the linker as an issue. To the best of
my knowledge, the linker will permit any byte value except 0 to appear
in an identifier. I don't see why the linker has to change at all for
any character set issues.
Ian