This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Query on UTF-32 encodings for letters


> Uppercase letters aren't accented in France

That is a (very commonly held) myth. Even many French people think this, but
it is wholly false. The true situation is that in classical typography,
upper case letters were always accented. Then typewriters came along and
it became customary to omit the accents. So widespread did this custom
become that many french schools taught that this was the preferred rule.
However, formally typeset material continued to use accents on upper
case letters. But this was never official usage. In fact I had a friend
Pascal Cleve (there is an accent grave over the first e), whose father
was denied some government benefit on the grounds that his name was
spelled wrong in his passport (without the accent). He bounced back
and forth between govt departments until finally the passport department
got the first typewriter in France that could put accents on upper case
letters.

I learned about this first from Alfred Strohmeir, working together on
the Ada 9X CRG (character rapporteur group -- we had banned all
discussion of characters from the main language group, so I chaired
the CRG to which all such discussions were consigned).

When I told Jean Ichbiah about this, he was adamant that I was wrong.
Luckily we were at his home which has an extensive French library, so
I sent him off to look for a typeset example of a missing accent.
After excusions through many examples (e.g. Journal Ecole ... with
accent acute on the E of course), he could not find ONE example to
back his point of view, and we found dozens that confirmed this.

> Uppercase letters aren't accented in France, but they are in Quebec.
> (That doesn't affect folding to lowercase, of course, but it does
> affect case-insensitive equality).

No it jolly well does not :-)
Not for identifiers at least.

> An example that affects folding to lowercase: I folds to i-without-dot
> in Turkish.  Those aren't in Latin-1, but they are in the Latin
> section of 10646.

Yes, but for Ada, we can consider identifier matching to be only in the
mode of folding to upper case, which takes care of the dotless i since
this folds to upper case I.

I know about this latter case, and I deal with the French accent case
above, do you know of any other cases?

Ok, fair enough, I was thinking more of the runtime case in my
comments.

At runtime, it seems that there may be many conventions, and indeed it is up to the programmer to follow rules appropriate to the particular application domain.

What makes Ada different is the requirement for absolutely defined
legality rules about what is allowed in identifiers and when they
compare equal.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]