This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [OT] char should be signed by default


> -----Original Message-----
> From: Gabriel Paubert [mailto:paubert@iram.es]
> Sent: Thursday, January 25, 2007 5:43 AM
> To: Paolo Bonzini
> Cc: Meissner, Michael; devils_advocate@austin.rr.com; gcc@gcc.gnu.org
> Subject: Re: [OT] char should be signed by default
> 
> On Thu, Jan 25, 2007 at 10:29:29AM +0100, Paolo Bonzini wrote:
> >
> > >>A given program is written in one or the other of these two
dialects.
> > >>The program stands a chance to work on most any machine if it is
> > >>compiled with the proper dialect. It is unlikely to work at all if
> > >>compiled with the wrong dialect.
> > >
> > >It depends on the program, and whether or not chars in the user's
> > >character set is sign extended (ie, in the USA, you likely won't
notice
> > >a difference between the two if chars just hold character values).
> >
> > You might notice if a -1 (EOF) becomes a 255 and you get an infinite
> > loop in return (it did bite me).  Of course, this is a bug in that
> > outside the US a 255 character might become an EOF.
> 
> That'a a common bug with getchar() and similar function because people
> put the result into a char before testing it, like:
> 
> 	char c;
> 	while ((c=getchar())!=EOF) {
> 		...
> 	}
> 
> while the specification of getchar is that it returns an unsigned char
> cast to an int or EOF, and therefore this code is incorrect
independently
> of whether char is signed or not:
> - infinite loop when char is unsigned
> - incomplete processing of a file because of early detection of EOF
>   when char is signed and you hit a 0xFF char.

Yep.  This was discussed in the ANSI X3J11 committee in the 80's, and it
is a problem (and the program is broken because getchar does return the
one out of band return value).  Another logical problem that occurs is
if you are on a system where char and int are the same size, that there
is no out of band
Value that can be returned, and in theory the only correct way is to use
feof and ferror, which few people do.

> I've been bitten by both (although the second one is less frequent now
> since 0xff is invalid in UTF-8).
> 
> BTW, I'm of the very strong opinion that char should have been
unsigned
> by default because the name itself implies that it is used as a
> enumeration of symbols, specialized to represent text. When you step
> from one enum value to the following one (staying within the range of
> valid values), you don't expect the new value to become lower than the
> preceding one.

And then there is EBCDIC, where there are 10 characters between 'I' and
'J' if memory serves.  Plus the usual problem in ASCII that the national
characters that are alphabetic aren't grouped with the A-Z, a-z
characters.
 
> Things would be very different if it had been called "byte" or
> "short short int" instead.
> 
> 	Gabriel
> 


--
Michael Meissner
AMD, MS 83-29
90 Central Street
Boxborough, MA 01719




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]