This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] fix a couple of bugs in const string folding (PR 86532)


On 07/19/2018 07:23 AM, Bernd Edlinger wrote:
@@ -633,12 +642,17 @@ c_strlen (tree src, int only_value)
 	return ssize_int (0);

       /* We don't know the starting offset, but we do know that the string
-	 has no internal zero bytes.  We can assume that the offset falls
-	 within the bounds of the string; otherwise, the programmer deserves
-	 what he gets.  Subtract the offset from the length of the string,
-	 and return that.  This would perhaps not be valid if we were dealing
-	 with named arrays in addition to literal string constants.  */
-      return size_diffop_loc (loc, size_int (maxelts * eltsize), byteoff);
+	 has no internal zero bytes.  If the offset falls within the bounds
+	 of the string subtract the offset from the length of the string,
+	 and return that.  Otherwise the length is zero.  Take care to
+	 use SAVE_EXPR in case the OFFSET has side-effects.  */
+      tree offsave = TREE_SIDE_EFFECTS (byteoff) ? save_expr (byteoff) : byteoff;
+      offsave = fold_convert (ssizetype, offsave);
+      tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
+				      build_int_cst (ssizetype, len * eltsize));
+      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
+      return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
+			      build_zero_cst (ssizetype));


This computes the number of bytes.
c_strlen is supposed to return number of (wide) characters:

You're right.  I guess that must mean the function isn't used
for wide character strings.  The original code also computes
a byte offset and so does the new expression.  I have no
problem changing it, but if feels like a change that should
be made independently of this bug fix.


/* Compute the length of a null-terminated character string or wide
    character string handling character sizes of 1, 2, and 4 bytes.
    TREE_STRING_LENGTH is not the right way because it evaluates to
    the size of the character array in bytes (as opposed to characters)
    and because it can contain a zero byte in the middle.


@@ -11343,16 +11356,15 @@ string_constant (tree arg, tree *ptr_offset)
     {
       if (TREE_CODE (TREE_TYPE (array)) != ARRAY_TYPE)
 	return NULL_TREE;
-      if (tree eltsize = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (array))))
-	{
-	  /* Add the scaled variable index to the constant offset.  */
-	  tree eltoff = fold_build2 (MULT_EXPR, TREE_TYPE (offset),
-				     fold_convert (sizetype, varidx),
-				     eltsize);
-	  offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, eltoff);
-	}
-      else
-	return NULL_TREE;
+
+      while (TREE_CODE (chartype) != INTEGER_TYPE)
+	chartype = TREE_TYPE (chartype);
+
+      /* Set the non-constant offset to the non-constant index scaled
+	 by the size of the character type.  */
+      offset = fold_build2 (MULT_EXPR, TREE_TYPE (offset),
+			    fold_convert (sizetype, varidx),
+			    TYPE_SIZE_UNIT (chartype));

here you fix the computation for wide character strings,
but I see no test cases with wide character stings.

The change above corrects the offset when eltsize refers to
the size of an array rather than its character elements.
It wasn't made to fix how wide strings are handled.  I don't
know if the function is ever called for them in a way that
matters (there are no wide character built-ins).  But it's
something to look into.


But down here you use a non-wide character function on a
wide character string:

   /* Avoid returning a string that doesn't fit in the array
      it is stored in, like
      const char a[4] = "abcde";
      but do handle those that fit even if they have excess
      initializers, such as in
      const char a[4] = "abc\000\000";
      The excess elements contribute to TREE_STRING_LENGTH()
      but not to strlen().  */
   unsigned HOST_WIDE_INT length
     = strnlen (TREE_STRING_POINTER (init), TREE_STRING_LENGTH (init));


Actually I begin to wonder, if all this wide character stuff is
really so common that we have to optimize it.

I've replaced the strnlen() call with string_length() in
the updated patch.  The code is entered for wchar_t (and
other) strings but since there are no wide character
built-ns in GCC I suspect that optimizations that apply
to them that don't also apply to other arrays are few
and far between.

At the same time, as wide characters (beyond wchar_t) are
increasingly becoming used in software I wouldn't want to
preclude future optimizations from making use of those that
benefit narrow strings.  In any event, that too is
a discussion to have independently of this bug fix.

Martin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]