Re: [PATCH] Handle overlength strings in the C FE

On 08/01/18 18:04, Joseph Myers wrote:
> On Wed, 1 Aug 2018, Bernd Edlinger wrote:
>> On 07/30/18 17:49, Joseph Myers wrote:
>>> On Mon, 30 Jul 2018, Bernd Edlinger wrote:
>>>> Hi,
>>>> this is how I would like to handle the over length strings issue in the C FE.
>>>> If the string constant is exactly the right length and ends in one explicit
>>>> NUL character, shorten it by one character.
>>> I don't think shortening should be limited to that case.  I think the case
>>> where the constant is longer than that (and so gets an unconditional
>>> pedwarn) should also have it shortened - any constant that doesn't fit in
>>> the object being initialized should be shortened to fit, whether diagnosed
>>> or not, we should define GENERIC / GIMPLE to disallow too-large string
>>> constants in initializers, and should add an assertion somewhere in the
>>> middle-end that no too-large string constants reach it.
>> Okay, there is an update following your suggestion.
> It seems odd to me to have two separate bits of code dealing with reducing
> the length, rather than something like
> if (too long)
>    {
>      /* Decide whether to do a pedwarn_init, or a warn_cxx_compat warning,
>         or neither.  */
>      /* Shorten string, in either case.  */
>    }
> The memcmp with "\0\0\0\0" is introducing a hidden assumption that any
> sort of character in strings is never more than four bytes.  It also seems
> unnecessary, in that ultimately the over-long string should be shortened
> regardless of whether what's being removed is a zero character or not.
> > It should not be possible to be over-long and fail tree_fits_uhwi_p
> (TYPE_SIZE_UNIT (type)), simply because STRING_CST lengths are stored in
> host int (even if, ideally, they'd use some other type to allow for
> STRING_CSTs over 2GB in size).  (And I don't think GCC can represent
> target type sizes that don't fit in unsigned HOST_WIDE_INT anyway; the
> only way for a target type size in bytes to fail to be representable in
> unsigned HOST_WIDE_INT should be if the size is not constant.)

A new simplified version of the patch is attached.

Bootstrapped and reg-tested as usual.
Is it OK for trunk?

2018-08-01  Bernd Edlinger  <>

	* c-typeck.c (digest_init): Shorten overlength strings.

diff -pur gcc/c/c-typeck.c gcc/c/c-typeck.c
--- gcc/c/c-typeck.c	2018-06-20 18:35:15.000000000 +0200
+++ gcc/c/c-typeck.c	2018-07-31 18:49:50.757586625 +0200
@@ -7435,19 +7435,17 @@ digest_init (location_t init_loc, tree type, tree
-	  TREE_TYPE (inside_init) = type;
 	  if (TYPE_DOMAIN (type) != NULL_TREE
 	      && TYPE_SIZE (type) != NULL_TREE
 	      && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST)
 	      unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (inside_init);
+	      unsigned unit = TYPE_PRECISION (typ1) / BITS_PER_UNIT;
 	      /* Subtract the size of a single (possibly wide) character
 		 because it's ok to ignore the terminating null char
 		 that is counted in the length of the constant.  */
-	      if (compare_tree_int (TYPE_SIZE_UNIT (type),
-				    (len - (TYPE_PRECISION (typ1)
-					    / BITS_PER_UNIT))) < 0)
+	      if (compare_tree_int (TYPE_SIZE_UNIT (type), len - unit) < 0)
 		pedwarn_init (init_loc, 0,
 			      ("initializer-string for array of chars "
 			       "is too long"));
@@ -7456,8 +7454,21 @@ digest_init (location_t init_loc, tree type, tree
 		warning_at (init_loc, OPT_Wc___compat,
 			    ("initializer-string for array chars "
 			     "is too long for C++"));
+	      if (compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
+		{
+		  unsigned HOST_WIDE_INT size
+		    = tree_to_uhwi (TYPE_SIZE_UNIT (type));
+		  const char *p = TREE_STRING_POINTER (inside_init);
+		  char *q = (char *)xmalloc (size + unit);
+		  memcpy (q, p, size);
+		  memset (q + size, 0, unit);
+		  inside_init = build_string (size + unit, q);
+		  free (q);
+		}
+	  TREE_TYPE (inside_init) = type;
 	  return inside_init;
       else if (INTEGRAL_TYPE_P (typ1))

