This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Query on UTF-32 encodings for letters

From: Robert Dewar <dewar at adacore dot com>
To: Marc Espie <espie at quatramaran dot ens dot fr>
Cc: gcc at gcc dot gnu dot org
Date: Sun, 30 Jan 2005 00:27:49 -0500
Subject: Re: Query on UTF-32 encodings for letters
References: <41E3E28D.6050506@adacore.com> <Pine.LNX.4.61.0501161942070.29730@digraph.polyomino.org.uk> <41EACFCA.7070506@adacore.com> <16875.56569.286000.776285@gargle.gargle.HOWL> <41EC0798.5020303@adacore.com> <16876.2932.32855.8813@gargle.gargle.HOWL> <41EC0D78.50201@adacore.com> <16876.4226.859818.910262@gargle.gargle.HOWL> <20050127211742.8FD8AD114@quatramaran.ens.fr>

Marc Espie wrote:

To corroborate your point, there is one fairly known set of printed typographic rules, _les regles de l'imprimerie nationale_ (accents omitted) that does mention this, among other things. Omitting accents over uppercase letters is in fact a spelling mistake. One fairly common thanks to cheap typography and uneducated people


Actually it is more likely to be educated people who have this
misconception since typically French schools used to teach that
this was proper style (omitting accents on upper case letters).
I do not know if this is still the case.

One interesting discussion (to bring it a bit back to topic,
is that if french programmers expect case folding, they tend
to expect e-acute to be folded to capital E without an acute
accent. We had a furious argument about this during the Ada 95
design. Jean wanted to apply the crossword criterion. THat says
that if in a crossword puzzle you can cross two letters, then
they should be regarded as equivalent in identifiers. So he
argued that e-acute and e should be considered equivalent.

The main (and effective) argument against this was that case
folding of this kind is locale dependent, so we could only do
approximate case folding anyway. I come to think that it is
a mistake for the Ada standard to mandate locale independent
case equivalence for wide characters in identifiers, but it
looks like it's too late to change people's minds on this.

Oh well, not too bad, it's not that hard to implement (you
will see the updated widechar unit checked in very soon that
supports the case equivalence stuff), and in practice I
think Ada programmers will follow the excellent style rule
of spelling a given identifier consistently anyway.

To me, case equivalence for identifiers in a language is
not about being able to spell a given identifier as Ada
in one place and ADA in another place, but rather it is
about preventing a program from having distinct identifiers
Ada and ADA, which makes programs hard to talk about.

It's likely we are waging a lost battle though, since quite a few people
have now seen more documents without accents over uppercase letters than
correct documents, and like lemmings, they will go on typesetting stuff
the wrong way---usually with Word and Microsoft's comic sans typeface
for greater effect...


Well stuff that is "typeset" using WORD is hardly serious. My
experience is that formally published material still always
correctly uses the accents.

Of course you may think that formally published material will
disappear and be replaced by junk on the net. But if that happens
we will lose more elements of style than just the upper case
accents in French :-)

Follow-Ups:
- Re: Query on UTF-32 encodings for letters
  - From: Gabriel Dos Reis

References:
- Query on UTF-32 encodings for letters
  - From: Robert Dewar
- Re: Query on UTF-32 encodings for letters
  - From: Joseph S. Myers
- Re: Query on UTF-32 encodings for letters
  - From: Robert Dewar
- Re: Query on UTF-32 encodings for letters
  - From: Paul Koning
- Re: Query on UTF-32 encodings for letters
  - From: Robert Dewar
- Re: Query on UTF-32 encodings for letters
  - From: Paul Koning
- Re: Query on UTF-32 encodings for letters
  - From: Robert Dewar
- Re: Query on UTF-32 encodings for letters
  - From: Paul Koning
- Re: Query on UTF-32 encodings for letters
  - From: Marc Espie

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]