This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: bumming cycles out of parse_identifier()...


On Tue, Sep 11, 2001 at 07:38:28AM +0100, Neil Booth wrote:
> 
> > Try it and see, sure, but I'm confidently predicting that someone in
> > Russia will write a library with headers with comments in KOI8-R, and
> > someone in Japan will try to use it from their program with comments
> > in SJIS, and we'll get the bug report when it doesn't Just Work.  (For
> > arbitrary values of country and character set, of course.)
> 
> I don't think SJIS is used outside limited areas - mainly for e-mail
> and Windows file names.  Here's what Markus said when I brought up a
> similar point; I hope he doesn't mind my quoting a large chunk of one
> of his mails:

SJIS was just an example.  Pick any two incompatible character sets in
common use.

I'm a lot less sanguine about Unicode than Markus is.  I predict that
this

M> (until UTF-8 can be considered ubiquitous).

will not happen for at least ten years, and this dictum

M> system-wide plaintext files (such as /usr/include/* or /etc/*) should
M> remain pure ASCII for the foreseeable future

has already been violated by the header files of at least one library
in common use.  (Data points, anyone?)

And we can have problems even if both of the conflicting encodings are
strict supersets of ASCII.  Consider string literals.

> I'm just concerned that this has the potential to become really
> complex and ugly, and don't really want to go there :-)

Without doubt it is complex and ugly.  I think that we will have to go
there eventually, though, and I also think that if we do a half-assed
job we will be stuck with its consequences for years.  So we should
take care and do it right the first time.

I don't think there's any particular _hurry_ to implement this, it's
not like we have users clamoring for native-language identifiers on
the mailing lists.

zw


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]