This is the mail archive of the
mailing list for the GCC project.
Re: Query on UTF-32 encodings for letters
Paul Koning wrote:
Then take i, which upcases to I with dot. Turkish has i with and
without dot, and the dot is preserved when you change case (in either
And AFAICT, the dot can be quite important, because when spoken,
the difference between Ä and i can mean quite different things,
much like the distinction between "year" and "your".
Would you map eszet (in German) to ss? Or to sz? Or neither? Modern
usage does the former; 1930-ish usage the latter.
Not very often even in the 30s.
Some more things into the pit: Almost never was there
an s followed by a z representing a sharp s in German.
You can go back to the middle ages (1100 or so) and find some
interesting spellings. But then you could also argue that
we should consider matching p with b and d with th
(as in English). See da consonant? :-)
There have been some debates about Ã, e.g. when
Switzerland discussed the issue in the 1960s. Technically,
it's not an eszet, and the Unicode databases doesn't say
00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
In the 1930s printers (at least science) used mostly what is now used
again as official spelling: two s for a sharp s (now: when the preceding
vowel is short). Swiss printers always use two s, which is
one of the reasons why you will hardly ever find Ã in Wirth's
In books around 1900 you can see the origin of sharp s,
long s followed by small s:
Most typographers and experts from related
professions will explain that sharp s has its origin
in this combination: a (then) normal s, long shape, same as you
can find in older English texts, followed
by a "Schluss-S" (final s, "normal" shape, ending a word.
Exceptional details omitted.)
Connect the upper end of the long s to the upper
end of the small s and you get sharp s. It's a ligature. (I will
omit the story about how handwriting has created the notion
of an "eszet".) This explains why "StraÃe" matches "STRASSE".
"STRAÃE" is kind of silly computerese. (StraÃe is German (de_DE)
for street, so I think it is a common name in computer programs.)
For a nice view, see