cpplib: Multichar charconst changes

Neil Booth neil@daikokuya.demon.co.uk
Sun May 5 17:50:00 GMT 2002


Zack Weinberg wrote:-

> I am mildly concerned about breaking things with this, but I do like
> the consistency it brings.  If you document the change, fine by me.

Thanks for looking at it, I've committed it.

The only "sensible" usage that gives different answers is with
signed characters where the 2nd or later character in the multichar
charconst is negative.  IMO the existing way of doing it is daft
and no-one can have been seriously relying on it.  See the docs
below for a concrete example.  [And I'm ignoring the confusion in
existing compilers between host and target precision.]

I've also created a testcase, which passes on x86 linux at least.
It of course fails with existing GCC releases.

Neil.

doc:
	* cpp.texi: Update multichar charconst docs.
testsuite:
	* gcc.dg/cpp/charconst-3.c: New test.

Index: doc/cpp.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/cpp.texi,v
retrieving revision 1.31
diff -u -p -r1.31 cpp.texi
--- doc/cpp.texi	26 Mar 2002 03:25:05 -0000	1.31
+++ doc/cpp.texi	5 May 2002 23:36:08 -0000
@@ -3508,17 +3508,25 @@ same column as it did in the original so
 
 @item The numeric value of character constants in preprocessor expressions.
 
-The preprocessor and compiler interpret character constants in the same
-way; escape sequences such as @samp{\a} are given the values they would
-have on the target machine.
+The preprocessor and compiler interpret character constants in the
+same way; i.e.@: escape sequences such as @samp{\a} are given the
+values they would have on the target machine.
 
 Multi-character character constants are interpreted a character at a
 time, shifting the previous result left by the number of bits per
-character on the host, and adding the new character.  For example, 'ab'
-on an 8-bit host would be interpreted as @w{'a' * 256 + 'b'}.  If there
-are more characters in the constant than can fit in the widest native
-integer type on the host, usually a @code{long}, the excess characters
-are ignored and a diagnostic is given.
+target character and adding the sign-extended value of the new
+character.  They have type @code{int}, and are treated as signed
+regardless of whether single characters are signed or not.  If there
+are more characters in the constant than would fit in the target
+@code{int}, a diagnostic is given, and the excess leading characters
+are ignored.  This methodology is not fully compatible with versions
+3.1 and earlier of GCC, which used a confusing and inconsistent
+valuation technique.
+
+For example, 'ab' for a target with an 8-bit @code{char} would be
+interpreted as @w{'a' * 256 + 'b'}, and 'a\234' as @w{'a' * 256 +
+'\234'}.  GCC 3.1 and earlier would give a different value for the
+latter example, probably @w{'a' * 256 + (unsigned char) '\234'}.
 
 @item Source file inclusion.
 
Index: testsuite/gcc.dg/cpp/charconst-3.c
===================================================================
RCS file: testsuite/gcc.dg/cpp/charconst-3.c
diff -N testsuite/gcc.dg/cpp/charconst-3.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ testsuite/gcc.dg/cpp/charconst-3.c	5 May 2002 23:36:08 -0000
@@ -0,0 +1,40 @@
+/* Copyright (C) 2001 Free Software Foundation, Inc.  */
+
+/* { dg-do compile } */
+/* { dg-options -Wno-multichar } */
+
+/* This tests values and signedness of multichar charconsts.
+
+   Neil Booth, 5 May 2002.  */
+
+#include <limits.h>
+
+int main ()
+{
+  /* These tests require at least 2-byte ints.  8-)  */
+#if INT_MAX > 127
+  int scale = (int) (unsigned char) -1 + 1;
+
+  if ('ab' != ('a' * scale + 'b'))
+    abort ();
+
+  if ('\234b' != ('\234' * scale + 'b'))
+    abort ();
+
+  if ('b\234' != ('b' * scale + '\234'))
+    abort ();
+
+  /* Multichar charconsts have type int and should be signed.  */
+#if INT_MAX == 32767
+  if ('\234a' > 0)
+    abort ();
+#elif INT_MAX == 2147483647
+  if ('\234aaa' > 0)
+    abort ();
+#elif INT_MAX == 9223372036854775807
+  if ('\234aaaaaaa' > 0)
+    abort ();
+#endif
+#endif
+  return 0;
+}



More information about the Gcc-patches mailing list