This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [OT] char should be signed by default


On Thu, Jan 25, 2007 at 10:29:29AM +0100, Paolo Bonzini wrote:
> 
> >>A given program is written in one or the other of these two dialects.
> >>The program stands a chance to work on most any machine if it is
> >>compiled with the proper dialect. It is unlikely to work at all if
> >>compiled with the wrong dialect.
> >
> >It depends on the program, and whether or not chars in the user's
> >character set is sign extended (ie, in the USA, you likely won't notice
> >a difference between the two if chars just hold character values).
> 
> You might notice if a -1 (EOF) becomes a 255 and you get an infinite 
> loop in return (it did bite me).  Of course, this is a bug in that 
> outside the US a 255 character might become an EOF.

That'a a common bug with getchar() and similar function because people
put the result into a char before testing it, like:

	char c;
	while ((c=getchar())!=EOF) {
		...
	}

while the specification of getchar is that it returns an unsigned char 
cast to an int or EOF, and therefore this code is incorrect independently 
of whether char is signed or not:
- infinite loop when char is unsigned
- incomplete processing of a file because of early detection of EOF 
  when char is signed and you hit a 0xFF char.

I've been bitten by both (although the second one is less frequent now
since 0xff is invalid in UTF-8).

BTW, I'm of the very strong opinion that char should have been unsigned
by default because the name itself implies that it is used as a 
enumeration of symbols, specialized to represent text. When you step
from one enum value to the following one (staying within the range of
valid values), you don't expect the new value to become lower than the 
preceding one.

Things would be very different if it had been called "byte" or 
"short short int" instead.

	Gabriel


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]