This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Query on UTF-32 encodings for letters

Paul Koning wrote:

Then take i, which upcases to I with dot. Turkish has i with and
without dot, and the dot is preserved when you change case (in either

And AFAICT, the dot can be quite important, because when spoken, the difference between Ä and i can mean quite different things, much like the distinction between "year" and "your".

Would you map eszet (in German) to ss?  Or to sz?  Or neither?  Modern
usage does the former; 1930-ish usage the latter.

Not very often even in the 30s.

Some more things into the pit: Almost never was there
an s followed by a z representing a sharp s in German.
You can go back to the middle ages (1100 or so) and find some
interesting spellings. But then you could also argue that
we should consider matching p with b and d with th
(as in English). See da consonant? :-)

There have been some debates about Ã, e.g. when Switzerland discussed the issue in the 1960s. Technically,
it's not an eszet, and the Unicode databases doesn't say


In the 1930s printers (at least science) used mostly what is now used
again as official spelling: two s for a sharp s (now: when the preceding
vowel is short). Swiss printers always use two s, which is
one of the reasons why you will hardly ever find à in Wirth's

In books around 1900 you can see the origin of sharp s,
long s followed by small s:
Most typographers and experts from related
professions will explain that sharp s has its origin
in this combination: a (then) normal s, long shape, same as you
can find in older English texts, followed
by a "Schluss-S" (final s, "normal" shape, ending a word.
Exceptional details omitted.)

Connect the upper end of the long s to the upper
end of the small s and you get sharp s. It's a ligature. (I will
omit the story about how handwriting has created the notion
of an "eszet".) This explains why "StraÃe" matches "STRASSE".
"STRAÃE" is kind of silly computerese. (StraÃe is German (de_DE)
for street, so I think it is a common name in computer programs.)

For a nice view, see


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]