This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Query on UTF-32 encodings for letters


Paul Koning wrote:
"Robert" == Robert Dewar <dewar@adacore.com> writes:


 Robert> Paul Koning wrote:
 >> Then take i, which upcases to I with dot.  Turkish has i with and
 >> without dot, and the dot is preserved when you change case (in
 >> either direction).

 Robert> Yes, and that's fine, both lower case i with dot and lower
 Robert> case i without dot fold upper case to capital I (without
 Robert> dot), and so all three are equivalent in identifiers.

That's wrong for Turkish.

This does indeed show that case conversion is locale dependent. But case equivalence in Ada identifiers cannot be locale dependent. So Ada is wrong for Turkish, and there is no practical way to make it right. Of course there can be a local character set available for Turkish Ada programmers (GNAT already implements several localized identifier character sets:

@item 1
ISO 8859-1 (Latin-1) identifiers

@item 2
ISO 8859-2 (Latin-2) letters allowed in identifiers

@item 3
ISO 8859-3 (Latin-3) letters allowed in identifiers

@item 4
ISO 8859-4 (Latin-4) letters allowed in identifiers

@item 5
ISO 8859-5 (Cyrillic) letters allowed in identifiers

@item 9
ISO 8859-15 (Latin-9) letters allowed in identifiers

@item p
IBM PC letters (code page 437) allowed in identifiers

@item 8
IBM PC letters (code page 850) allowed in identifiers

@item f
Full upper-half codes allowed in identifiers

@item n
No upper-half codes allowed in identifiers

@item w
Wide-character codes (that is, codes greater than 255)
allowed in identifiers))

But the point is that the standard rules cannot be locale
dependent, so some choice has to be made. Basically there
were two approaches:

1. Don't allow any case equivalence for wide characters used in
identifiers (this is the way the -gnatiw switch in GNAT Ada 95
mode works).

2. Allow best-possible case mapping, understanding that it will
be not quite right in some cases.

I would have chosen 1 (as I said, this is what I did choose :-)
The Ada Committee (in all its wisdom) has chosen approach 2.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]