This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Ada.Characters.{Wide_}Latin_9 should be deleted

From: David Starner <starner at okstate dot edu>
To: gcc at gcc dot gnu dot org
Date: Tue, 30 Apr 2002 03:22:54 -0500
Subject: Ada.Characters.{Wide_}Latin_9 should be deleted

Ada.Characters.Latin_9 and Ada.Characters.Wide_Latin_9 are new packages
added to GCC 3.2. In my opinion, both should be removed before release,
with proper internationization added in their place if neccesary. The
reason is that hard coding character set into the code is a bad idea and
that it's unreasonable and unhelpful to add one arbitrary new 8-bit
character set to the Ada library. (Character.Wide_Latin_9 has additional
problems, as it creates a pseudo-Unicode.)

Standard Ada specifies that type Character stores Latin 1, with
implemenations having the option to offer other character sets, of
course. As most of the systems that GNAT runs on - Unix, Linux, Windows,
Mac OS X - let the user change which character set is the native
character set, any compile-time hard coding of Character is a bad idea.
Any current program that properly handles characters properly on those
systems already handles the upper 7 bits of Character opaquely.

Given this, what does Ada.Characters.Latin_9 give us? It gives another
character set we can hard code; one chosen, I can only guess, so Western
European programmers can continue to ignore proper character set
handling. There's no techincal reason to special case Latin 9 here; it's
no different from the 15 other 8-bit locale character sets on GNU/Linux,
nor the 14 other parts of ISO-8859. There is no reason good code can't
handle all of those character sets without problem; but it won't get
there through Ada.Characters.Latin_9.

Ada.Character.Wide_Latin_9 goes one step worse.
Ada.Character.Wide_Latin_9 basically creates a 16 bit character set with
character 0 through 255 corresponding to the Latin 9 value. What the
value of this is, I don't know - Wide_Character can already hold all the
values of Latin 9. Given as this is put into a character type defined as
holding Unicode, and doing input and output as if it were Unicode,
inevitably, someone will output it as Unicode, and we will have Latin 9
formatted and labeled as Unicode running around the world. Unicode's
goal of world conquest has been bedeviled by these types of problems;
let's not add another one. Even if Characters.Latin_9 is not removed,
Wide_Latin_9 should be.

Ack. A little more verbose then I intended. In summary; handling
huge variety of character data for world wide users - good. Adding a
patch to handle one new character set for a small set of users - bad.

-- 
David Starner - starner@okstate.edu
"It's not a habit; it's cool; I feel alive. 
If you don't have it you're on the other side." 
- K's Choice (probably referring to the Internet)

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]