This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Ada.Characters.{Wide_}Latin_9 should be deleted

From: starner at okstate dot edu
To: gcc at gcc dot gnu dot org
Date: Sun, 19 May 2002 18:51:40 -0500 (CDT)
Subject: Re: Ada.Characters.{Wide_}Latin_9 should be deleted

> We are not in the business of telling people how to write code. 

I seem to remember you telling someone that GNAT would never have
goto labels, because they were the wrong solution - speed over 
safety. We don't tell people how to code, but we are in the 
business of giving them the best tools to do so.

> but it is quite inappropriate to
> remove useful features in an attempt to force programmers to program the way
> you think they should. 

Every feature is useful to someone. Surely we should remove poorly thought-
out features before they are released and become used.

> and the vast majority of Ada programs use Latin-1

No, they don't. The vast majority of Ada programs, when they use characters,
read some 8-bit characters in, possibly examine some ASCII characters, and 
emit those characters. These programs don't care which character set is 
being used; most of them don't even care if a multi-byte character set is 
being used, as long it doesn't put stuff in the ASCII range.

> But Latin-1 is now obsolete and for practical purposes replaced by Latin-9.

Latin-9 obsoletes Latin-1 no more than C++ obsoletes C. HTML assumes Latin-1 
as the base characters; CP-1252 is based on Latin-1, not Latin-9. These 
things will never change. In some ways, Latin-1 is _the_ 8-bit character set;
Latin-9 is just another 8-bit character set, no more important than any 
other 8-bit character set.

> What you seem to want to do is to use the leverage of the introduction of
> the Euro symbol to force people to move to 16 (or 32!) bit character sets,
> but that's definitely not helpful to most people writing programs in 
> Europe today. 

Latin-9 is definitely not helpful to at least a third of the people 
writing programs in Europe today, as their native languages aren't
included in Latin-9. EUR works just fine for the Euro symbol in most
cases; I see no reason to add something just for one character when
many more people need to use thier native languages.

> Looking at wide_character, a basic assumption is that the first 256
> positions of Wide_Character correspond to Character. A lot of code
> depends on this, 

Then it's buggy and needs to be fixed, just like any code that assumes
that every system is big-endian. 

> The idea that all programs
> should be able to use the local character set is a wish you have as part of
> your multi-cultural mission, but it is unrealistic, 

It's no more unrealistic than writing code portable to other systems. It's
not feasible in some situations, but it's a goal that proper compiler and
library support can encourage and make much simpler.

> Ada is NOT designed to force people in this
> direction, and you have no business distorting Ada to do this.

Poland is now part of NATO; I think it's about time people designing Ada 
to be for Western Europe uber alles need to rethink their goals. It's a wide
world, and almost anything - Ada.Characters.KOI8-R, 
Ada.Characters.Latin-Greek, Ada.Characters.Latin-10 - would go further in
making Ada part of it.

> So anyway, rather than just produce a patch, I think the first thing is to
> propose a design for discussion (we never implement new features without
> first discussing the design, especially if they are actually or in effect
> language extensions.

I'm all ears, provided you're actually willing to let someone outside ACT in
on the discussion.

The code I have was designed as part of an add-on library, so it doesn't change
GNAT at all. (Ngeadal was designed as a Unicode library for Ada; I gave up at 
some point upon realizing that a binding to ICU would be more useful. Having 
some basic stuff, like 32-bit characters in the GNAT library itself, would be
useful, though.) Wide_Wide_Character, or whatever it should be called, is an
opaque type that is internally an integer from 0 to 16#10FFFF#. For a start,
the interfaces would parellel those of Ada.Strings.*. Comments, or should I
come up with the full interfaces for discussion?

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]