This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Ada.Characters.{Wide_}Latin_9 should be deleted

From: dewar at gnat dot com (Robert Dewar)
To: gcc at gcc dot gnu dot org, starner at okstate dot edu
Date: Sun, 19 May 2002 20:08:02 -0400 (EDT)
Subject: Re: Ada.Characters.{Wide_}Latin_9 should be deleted

> I seem to remember you telling someone that GNAT would never have
> goto labels, because they were the wrong solution - speed over
> safety. We don't tell people how to code, but we are in the
> business of giving them the best tools to do so.

I never like to use Huh? but it's tempting in this case :-). No, you never
heard me say something so ludicrous, and indeed I am not goto allergic,
I am happy to use a goto when it makes sense (e.g. to simulate continue
in Ada, or to implement a FSM, and you will find labels and gotos in
the GNAT code, not many, but certainly the above wrong-headed attitude
is not one I would ever have tolerated, let alone espouse!

> Every feature is useful to someone. Surely we should remove poorly thought-
> out features before they are released and become used. 

This is a feature that Ada programmers want and need. European programs
most certainly want to stay with 8-bit characters, and it is just whistling
in the wind to demand that they change. I don't join in your viewpoints at
all, but that's irrelevant. The point is that Ada encourages the use of
8-bit characters, and you have no business trying to change this.

>> and the vast majority of Ada programs use Latin-1

> No, they don't. The vast majority of Ada programs, when they use characters,
> read some 8-bit characters in, possibly examine some ASCII characters, and
> emit those characters. These programs don't care which character set is
> being used; most of them don't even care if a multi-byte character set is
> being used, as long it doesn't put stuff in the ASCII range.

That's just wrong, there are lots of Ada programs that have identifiers
written in Latin-1 using upper half characters, and I can most certainly
see that these days someone who wants the Euro symbol would prefer to use
the definition in Latin-9:

  Ada.Characters.Latin_9.Euro_Sign

rather than using its absolute code, or writing some character literal
that may not look right on all equipment. Remember that this is the
main function of this package, to provide such definitions.

> Latin-9 obsoletes Latin-1 no more than C++ obsoletes C. HTML assumes Latin-1
> as the base characters; CP-1252 is based on Latin-1, not Latin-9. These
> things will never change. In some ways, Latin-1 is _the_ 8-bit character set;
> Latin-9 is just another 8-bit character set, no more important than any
> other 8-bit character set.

The sense in which I mean this is that in practice throughout Europe the
graphic representation of type Character will correspond to Latin-9, not
Latin-1, so the names of characters in the package may as well reflect
the reality.

>> Looking at wide_character, a basic assumption is that the first 256
>> positions of Wide_Character correspond to Character. A lot of code
>> depends on this,
>
> Then it's buggy and needs to be fixed, just like any code that assumes
> that every system is big-endian.

Well the Ada standard has this to say:

47  To_Wide_Character
                Returns the Wide_Character X such that Character'Pos(Item) =
                Wide_Character'Pos(X).

which pretty strongly makes the connection implicit, and indeed programs
rely on this. There is no point in telling people this is wrong. I certainly
refuse to join you in that endeavor. So you are just one person pushing one
point of view, there is no need for GNAT to enforce either your or my point
of view, it would be inappropriate.

> It's no more unrealistic than writing code portable to other systems. It's
> not feasible in some situations, but it's a goal that proper compiler and
> library support can encourage and make much simpler.

It is one thing to provide tools that encourage this, quite another to
deliberately cripple the language in an attempt to force people to do this.

> I'm all ears, provided you're actually willing to let someone outside ACT in
> on the discussion.

You have completely the wrong idea, I don't think that this list is the
right venue, or that ACT should have any particular inside track on this
discussion. You should submit your proposal to ada-comment. I assume you
know the procedure.

> The code I have was designed as part of an add-on library, so it doesn't change
> GNAT at all. (Ngeadal was designed as a Unicode library for Ada; I gave up at
> some point upon realizing that a binding to ICU would be more useful. Having
> some basic stuff, like 32-bit characters in the GNAT library itself, would be
> useful, though.) Wide_Wide_Character, or whatever it should be called, is an
> opaque type that is internally an integer from 0 to 16#10FFFF#. For a start,
> the interfaces would parellel those of Ada.Strings.*. Comments, or should I
> come up with the full interfaces for discussion?

Well such an add on library is a perfectly reasonable and useful addition.
I am always in favor of giving the programmer additional tools, but I hate
it when the additional tool providers demand that I get rid of some of my
existing tools first :-)

I am not sure what you mean by having the basic stuff in the GNAT library.
As I mentioned before, I would be dubious about adding the type 
Wide_Wide_Character to Standard, but your new package should presumably
be in the GNAT library.

Actually we were thinking of introducing GNAT.Strings to hold a commonb
definition of string access etc, so that might be a good place to put it.
Of course if the ARG felt this is an interesting enough addition to bless
it, then it would end up in the Ada hierarchy. You can't add things to the
Ada hierarchy without that blessing :-)

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]