This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Query on UTF-32 encodings for letters
Paul Koning wrote:
"Robert" == Robert Dewar <dewar@adacore.com> writes:
Robert> Paul Koning wrote:
>> Then take i, which upcases to I with dot. Turkish has i with and
>> without dot, and the dot is preserved when you change case (in
>> either direction).
Robert> Yes, and that's fine, both lower case i with dot and lower
Robert> case i without dot fold upper case to capital I (without
Robert> dot), and so all three are equivalent in identifiers.
That's wrong for Turkish.
This does indeed show that case conversion is locale dependent.
But case equivalence in Ada identifiers cannot be locale dependent.
So Ada is wrong for Turkish, and there is no practical way to make
it right. Of course there can be a local character set available
for Turkish Ada programmers (GNAT already implements several
localized identifier character sets:
@item 1
ISO 8859-1 (Latin-1) identifiers
@item 2
ISO 8859-2 (Latin-2) letters allowed in identifiers
@item 3
ISO 8859-3 (Latin-3) letters allowed in identifiers
@item 4
ISO 8859-4 (Latin-4) letters allowed in identifiers
@item 5
ISO 8859-5 (Cyrillic) letters allowed in identifiers
@item 9
ISO 8859-15 (Latin-9) letters allowed in identifiers
@item p
IBM PC letters (code page 437) allowed in identifiers
@item 8
IBM PC letters (code page 850) allowed in identifiers
@item f
Full upper-half codes allowed in identifiers
@item n
No upper-half codes allowed in identifiers
@item w
Wide-character codes (that is, codes greater than 255)
allowed in identifiers))
But the point is that the standard rules cannot be locale
dependent, so some choice has to be made. Basically there
were two approaches:
1. Don't allow any case equivalence for wide characters used in
identifiers (this is the way the -gnatiw switch in GNAT Ada 95
mode works).
2. Allow best-possible case mapping, understanding that it will
be not quite right in some cases.
I would have chosen 1 (as I said, this is what I did choose :-)
The Ada Committee (in all its wisdom) has chosen approach 2.