This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Query on UTF-32 encodings for letters



On 2005-01-18, at 07:20, Robert Dewar wrote:


Marcin Dalecki wrote:

Look the problem isn't the fact that somebody wishes to support international
encodings for symbols in code. This may be even helpful to some.
However I still stand by the opinion that declaring them case insensitive
is some kind of wired idiocy which can be only the result of some
polit-bureau group. What are the good reasons to make them such in first place?

OK, but the fight over whether Ada should have case insensitive identifier
names was discussed and decided 20 years ago, with almost no controversy.
Pretty much everyone agreed this was the way to go. There are many good
reasons, which are not worth rehashing here, since this thread is not
about replaying that old chesnut! And if you start of thinking it is
wired idiocy, and have not bothered to read up on the issue, I think
it unlikely to be fruitful to discuss it in any case!

Reading up on the issue or not is not the point here. Simple due to the fact
that I'm indeed a person which is using up to 4 different languages with
different character set encodings (ASCII, Latin-1, Latin-2, KO8-R, KOI8-U) on
a regular basis, did make me already suffer enough from such "good" "i18n" ideas.
Bah... Even in the simple case where I did have to read some code commented in german,
I did nearly always for some reason have to adjust to a different wired
pseudo-standard encoding I didn't have support for on the system I did have to use it on.


Based on this experience I just think that you are indeed wasting your time
on extending such a facility. There simply isn't such a thing
like a well defined equivalence relation R holding only and
only then when some two string x,y are equal with disregard to the casing: x R y.
There are too many external variables such a relation depends on. There isn't
even such a basic thing as a standard way to tell which encoding a file is
written in. If looking at cyrillic for example it nearly always turns out to be
a guess game... ALT, KOI8-R, ISO-8859-5, KOI8-U, CP1259?



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]