This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Query on UTF-32 encodings for letters

From: Robert Dewar <dewar at adacore dot com>
To: Paul Koning <pkoning at equallogic dot com>
Cc: joseph at codesourcery dot com, gcc at gcc dot gnu dot org
Date: Mon, 17 Jan 2005 14:09:44 -0500
Subject: Re: Query on UTF-32 encodings for letters
References: <41E3E28D.6050506@adacore.com> <Pine.LNX.4.61.0501161942070.29730@digraph.polyomino.org.uk> <41EACFCA.7070506@adacore.com> <16875.56569.286000.776285@gargle.gargle.HOWL> <41EC0798.5020303@adacore.com> <16876.2932.32855.8813@gargle.gargle.HOWL>

Paul Koning wrote:

But that is nowhere near sufficient.  The issue is that case folding
rules are different for different languages/locales that use the SAME
character set.  For example, there are a whole bunch of different
folding rules for Latin-1.


Well in practice the folding rules for Latin-1 have been part of the
standard for ten years, so they are not about to change.

It would be interesting to know an example of what you state above.
Certainly people have been using Latin-1 to write Ada in countries
all over the world, and no one has ever found the folding rules
for identifiers to be in any way inconvenient.

There was a point in the discussion early on when JDI wanted upper
case E and lower case E-acute to match in identif

The decision in Ada is that you do not want the meaning of a program
or its legality to change in a locale dependent way. This is really
a fundamental starting point. Note that this is a radically different
issue from folding at run-time in a manner that makes sense to an
application program.

If 10646 defines a single set of rules, then it's part of the problem, not part of the solution.


Well the 10646 definition provides a framework from which an acceptable
locale-independent set of folding rules can be obtained. Note that acceptable
here means acceptable to at least the ISO P-members. Indeed when it comes
to such issues in the Ada standard, this is an area where the non-english
speaking member countries take the lead.

Follow-Ups:
- Re: Query on UTF-32 encodings for letters
  - From: Robert Dewar
- Re: Query on UTF-32 encodings for letters
  - From: Paul Koning
- Re: Query on UTF-32 encodings for letters
  - From: Marcin Dalecki

References:
- Query on UTF-32 encodings for letters
  - From: Robert Dewar
- Re: Query on UTF-32 encodings for letters
  - From: Joseph S. Myers
- Re: Query on UTF-32 encodings for letters
  - From: Robert Dewar
- Re: Query on UTF-32 encodings for letters
  - From: Paul Koning
- Re: Query on UTF-32 encodings for letters
  - From: Robert Dewar
- Re: Query on UTF-32 encodings for letters
  - From: Paul Koning

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]