[PATCH] fix a couple of bugs in const string folding (PR 86532)

Martin Sebor msebor@gmail.com
Thu Jul 19 19:51:00 GMT 2018


On 07/19/2018 12:19 AM, Bernd Edlinger wrote:
>           if (TREE_CODE (idx) != INTEGER_CST
>               && TREE_CODE (argtype) == POINTER_TYPE)
>             {
>               /* From a pointer (but not array) argument extract the variable
>                  index to prevent get_addr_base_and_unit_offset() from failing
>                  due to it.  Use it later to compute the non-constant offset
>                  into the string and return it to the caller.  */
>               varidx = idx;
>               ref = TREE_OPERAND (arg, 0);
>
>               tree type = TREE_TYPE (arg);
>               if (TREE_CODE (type) == ARRAY_TYPE
>                   && TREE_CODE (type) != INTEGER_TYPE)
>                 return NULL_TREE;
>             }
>
> the condition TREE_CODE(type) == ARRAY_TYPE
> && TREE_CODE (type) != INTEGER_TYPE looks funny.
> Check for ARRAY_TYPE should imply != INTEGER_TYPE.

Yes, that other test was superfluous.  I've removed it in
the updated patch I just posted.

>
>   else if (DECL_P (arg))
>     {
>       array = arg;
>       chartype = TREE_TYPE (arg);
>     }
>
> chartype is only used in the if (varidx) block, but that is always zero
> in this case.

True.  Chartype could be left uninitialized.  I did it mostly
out of an abundance of caution (I don't like leaving variables
with such big scope uninitialized).  In the latest update
I instead initialize chartype to null.

>
>       while (TREE_CODE (chartype) == ARRAY_TYPE
>              || TREE_CODE (chartype) == POINTER_TYPE)
>         chartype = TREE_TYPE (chartype);
>
> you multiply sizeof(chartype) with varidx but you should probably
> use the type of the  TREE_OPERAND (arg, 0); above instead.

The offset returned to the caller is relative to the character
array so varidx must also be the index into the same array.
Otherwise it cannot be used.  The difference is this:

   const char a[][4] = { "12", "123" };

   int x = strlen (&a[0][i]);   // use i as the offset
   int y = strlen (&a[i][0]);   // bail

>
> this is not in the patch, but I dont like it at all, because it compares the
> size of a single initializer against the full size of the array.  But it should
> be the innermost enclosing array:
>
>   tree array_size = DECL_SIZE_UNIT (array);
>   if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
>     return NULL_TREE;
>
>   /* Avoid returning a string that doesn't fit in the array
>      it is stored in, like
>      const char a[4] = "abcde";
>      but do handle those that fit even if they have excess
>      initializers, such as in
>      const char a[4] = "abc\000\000";
>      The excess elements contribute to TREE_STRING_LENGTH()
>      but not to strlen().  */
>   unsigned HOST_WIDE_INT length
>     = strnlen (TREE_STRING_POINTER (init), TREE_STRING_LENGTH (init));
>   if (compare_tree_int (array_size, length + 1) < 0)
>     return NULL_TREE;

The use of strnlen here isn't right for wide strings and needs
to be replaced with a call to string_length() from builtins.c.
I've made that change in the updated patch.

The remaining concern is orthogonal to the changes to fix
the wrong code.  As I mentioned, I opened a few bugs to
improve things in this area: 86434, 86572, 86552.  As
the first step I'm about to post a solution for 86552.

>
> consider the following test case:
> $ cat part.c
> const char a[2][3][8] = { { "a", "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb", "ccc"},
>                           { "dddd", "eeeee", "ffffff" } };
>
> int main ()
> {
>   int n = __builtin_strlen (&a[0][1][0]);
>
>   if (n == 30)
>     __builtin_abort ();
> }

With my patch for pr86552 GCC prints:

warning: initializer-string for array of chars is too long
  const char a[2][3][8] = { { "a", "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb", "ccc"},
                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: (near initialization for ‘a[0][1]’)
In function ‘main’:
warning: ‘__builtin_strlen’ argument missing terminating nul 
[-Wstringop-overflow=]
    int n = __builtin_strlen (&a[0][1][0]);
            ^~~~~~~~~~~~~~~~
note: referenced argument declared here
  const char a[2][3][8] = { { "a", "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb", "ccc"},

We can discuss what value, if any, might be more appropriate
to fold the length to in these undefined cases but  I would
prefer to have that discussion separately from this review.

Martin



More information about the Gcc-patches mailing list