[PATCH] fix a couple of bugs in const string folding (PR 86532)

Thu Jul 19 13:23:00 GMT 2018

> @@ -633,12 +642,17 @@ c_strlen (tree src, int only_value)
>  	return ssize_int (0);
>  
>        /* We don't know the starting offset, but we do know that the string
> -	 has no internal zero bytes.  We can assume that the offset falls
> -	 within the bounds of the string; otherwise, the programmer deserves
> -	 what he gets.  Subtract the offset from the length of the string,
> -	 and return that.  This would perhaps not be valid if we were dealing
> -	 with named arrays in addition to literal string constants.  */
> -      return size_diffop_loc (loc, size_int (maxelts * eltsize), byteoff);
> +	 has no internal zero bytes.  If the offset falls within the bounds
> +	 of the string subtract the offset from the length of the string,
> +	 and return that.  Otherwise the length is zero.  Take care to
> +	 use SAVE_EXPR in case the OFFSET has side-effects.  */
> +      tree offsave = TREE_SIDE_EFFECTS (byteoff) ? save_expr (byteoff) : byteoff;
> +      offsave = fold_convert (ssizetype, offsave);
> +      tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
> +				      build_int_cst (ssizetype, len * eltsize));
> +      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
> +      return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
> +			      build_zero_cst (ssizetype));

This computes the number of bytes.
c_strlen is supposed to return number of (wide) characters:

/* Compute the length of a null-terminated character string or wide
    character string handling character sizes of 1, 2, and 4 bytes.
    TREE_STRING_LENGTH is not the right way because it evaluates to
    the size of the character array in bytes (as opposed to characters)
    and because it can contain a zero byte in the middle.

> @@ -11343,16 +11356,15 @@ string_constant (tree arg, tree *ptr_offset)
>      {
>        if (TREE_CODE (TREE_TYPE (array)) != ARRAY_TYPE)
>  	return NULL_TREE;
> -      if (tree eltsize = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (array))))
> -	{
> -	  /* Add the scaled variable index to the constant offset.  */
> -	  tree eltoff = fold_build2 (MULT_EXPR, TREE_TYPE (offset),
> -				     fold_convert (sizetype, varidx),
> -				     eltsize);
> -	  offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, eltoff);
> -	}
> -      else
> -	return NULL_TREE;
> +
> +      while (TREE_CODE (chartype) != INTEGER_TYPE)
> +	chartype = TREE_TYPE (chartype);
> +
> +      /* Set the non-constant offset to the non-constant index scaled
> +	 by the size of the character type.  */
> +      offset = fold_build2 (MULT_EXPR, TREE_TYPE (offset),
> +			    fold_convert (sizetype, varidx),
> +			    TYPE_SIZE_UNIT (chartype));

here you fix the computation for wide character strings,
but I see no test cases with wide character stings.

But down here you use a non-wide character function on a
wide character string:

   /* Avoid returning a string that doesn't fit in the array
      it is stored in, like
      const char a[4] = "abcde";
      but do handle those that fit even if they have excess
      initializers, such as in
      const char a[4] = "abc\000\000";
      The excess elements contribute to TREE_STRING_LENGTH()
      but not to strlen().  */
   unsigned HOST_WIDE_INT length
     = strnlen (TREE_STRING_POINTER (init), TREE_STRING_LENGTH (init));

Actually I begin to wonder, if all this wide character stuff is
really so common that we have to optimize it.
Same for the strlen(&a[0][i]), does this happen really so often that
it is a worth the risk?

Bernd.