This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: thoughts on martin's proposed patch for GCC and UTF-8


    If you *don't* do the translation, all your other tools (emacs,
    less, grep, etc) need to understand the #pragma locale statement,

On the contrary, most other programs have no need to understand it.
Most of these "tools" don't pay attention to the character encoding.
Only Emacs does--and it has its own way you can specify the encoding,
if it guesses wrong.

It is ok for Emacs to guess, since the results are shown to you
straightaway; if it guessed wrong, you will see that on the screen.

In many cases, you don't need to care.  If you visit a file, change
some text at the beginning and save it, and if some part later in the
file (which you did not look at) contained some Latin-N characters, it
makes no difference to you whether Emacs thought they were Latin-1 or
Latin-2.  All that matters is that they are saved the same as they
were before.

But if GCC gets this wrong, you will get errors or incorrect behavior
later on, and it may take some time for you to even notice, let alone
figure out the cause.

    Another problem is that switching character encoding
    in-band may be difficult.  Many libraries do not support it.
    The Java FileReader class requires you to specify the encoding
    at *open* time.

GCC is not written in Java and does not use this class,
so this limitation is not a factor for us.

      Perferably
    each file should specify its encoding out-of-bound,
    just like MIME does.

I would not object to this sort of system, if users were happy with
it.  It would avoid depending on the environment.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]