This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: thoughts on martin's proposed patch for GCC and UTF-8


   Date: Wed, 09 Dec 1998 23:02:40 -0800
   From: Per Bothner <bothner@cygnus.com>

   This is a somewhat hypythetical problem, as we have no experience
   with to what extent if any people need to be able use non-Ascii
   characters in their source files.

I have some experience; we sometimes use gcc that way here.

   But I assume they will want to do that in their locale's text
   encoding - which need not be a "UTF-8" locale.

Yes; this is already widespread practice for C strings.

   In that case, jc1 (or a pre-processor for jc1) has to translate the
   locale's character set into Unicode.

It could also be done by a postprocessor for jc1.

   The locale for assembler files should probably also be UTF-8.

This disagrees with existing practice with C strings.  I don't think
it's wise to commit now to UTF-8 for all assembler files.  Among other
things, it'd mean you couldn't look at the files with Emacs (as the
latest Emacs doesn't support UTF-8).

I have misgivings about having GCC support multiple locales
simultaneously.  Multilingual applications are the province of fancy
text editors like Emacs; simple translators like GCC shouldn't have to
worry about handling multiple locales in the same program execution.
I've dealt with programs like that, and they are a pain to configure
and maintain.  For GCC it's cleaner to add a separate pass to
translate the assembler input, if this is needed.

To some extent this is an ``after you, alphonse'' situation.  The gas
people don't want to worry about translating codes, and I don't blame
them.  I don't want cpp to worry about it either.  Or cc1.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]