[PATCH] Fix PR 100944 - Array boundary misscalculation

Martin Sebor msebor@gmail.com
Wed Jun 16 00:22:34 GMT 2021


On 6/15/21 12:23 AM, Richard Biener wrote:
> On Mon, Jun 14, 2021 at 7:32 PM Martin Sebor <msebor@gmail.com> wrote:
>>
>> On 6/14/21 12:36 AM, Richard Biener via Gcc-patches wrote:
>>> On Sun, Jun 13, 2021 at 7:00 PM Giuliano Belinassi via Gcc-patches
>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>
>>>> This patch proposes a fix to PR 100944 by improving the array boundary
>>>> computation when the flexible array has no clear constructor: if no
>>>> constructor were found in the input code, we compute the size of the
>>>> array as:
>>>>
>>>>     offset(array begin) - offset(next element in RECORD_TYPE)
>>>>
>>>> If no next element is found in the RECORD, simply fall back to:
>>>>
>>>>     offset(array begin) - offset(RECORD_TYPE ends)
>>>>
>>>> We avoid doing this calculation if the RECORD_TYPE is actually an
>>>> UNION_TYPE, once things get very complicated in this case.
>>>
>>> I'm looking at Wzero-length-array-bounds.c and wonder why the patch
>>> makes a difference here.
>>>
>>> struct X { int a[0]; int b, c; };
>>>
>>> a is clearly not a flex-array?
>>
>> The warning issued here shouldn't change.  -Wzero-length-bounds is
>> documented to
>>
>>     Warn about Accesses to elements of zero-length array members that
>>     might overlap other members of the same object.
>>
>> By "the object" the text means the complete object (if one can be
>> determined), not necessarily just the subobject of the enclosing
>> struct.  This is contrast to accesses to zero-length arrays that
>> are out of the bounds of the complete object.  This distinction
>> was made to help the kernel transition to the new warning (I'm
>> not sure if they're done with the cleanup yet).
>>
>>>
>>> The code in component_ref_size is truly weird.  IMHO using
>>> get_ref_base_and_unit_offset for computing the size of a flexarray
>>> member is the wrong utility - instead a much better choice
>>> (also for consistency) would be array_at_struct_end_p itself,
>>> for example by giving it an optional HOST_WIDE_INT *size
>>> argument.
>>
>> array_at_struct_end_p() is permissive and treats all trailing arrays
>> as effectively unbounded if it doesn't know the enclosing object.
>> -Warray-bounds and all other access warnings are more conservative
>> and treat only flexible array members and trailing arrays of zero
>> length as unbounded (except in complete objects where they consider
>> the initializer for the former).
> 
> Sure, but that distinction is easy to fix for the consumer by looking at
> the array field in case array_at_struct_end_p returns true.  What I'm
> saying is that array_at_struct_end_p already performs analysis
> whether the array is the last field in the object and it already performs
> analysis of the object size - meaning it computes the possible extent
> of the array (and it matches that up with the array declaration).  It could
> simply communicate the length [bound] to the caller.
> 
> And if looking at an initializer is really necessary (IMHO that smells like
> the FE issue that fails to adjust DECL_SIZE of the object from it?) then
> even array_at_struct_end_p could be improved by looking at it by
> passing the (optional) initializer to the function.

The two functions serve different purposes but having them share
the same implementation seems like a fine idea as long as
component_ref_size still returns the same amount of detail about
the member.

I don't have much of an opinion on DECL_SIZE reflecting the size
of the whole object, including the flexible array member and its
initializer.  It seems to make sense to me, but it's been that
way for ages so I wouldn't expect changing it to be a slam dunk.

Martin

> 
> Richard.
> 
>> Martin
>>
>>>
>>> Richard.
>>>
>>>> gcc/ChangeLog:
>>>> 2021-13-08  Giuliano Belinassi  <giuliano.belinassi@usp.br>
>>>>
>>>>           PR middle-end/100944
>>>>           * tree-dfa.c (get_ref_base_and_unit_offset): Add option to compute size
>>>>           of next field.
>>>>           * (get_ref_base_and_unit_offset_1): Same as above.
>>>>           * tree-dfa.h (get_ref_base_and_unit_offset): Same as above.
>>>>           (get_ref_base_and_unit_offset_1): Same as above.
>>>>           * tree.c (least_common_record_1): New.
>>>>           (least_common_record): New.
>>>>           (component_ref_size): Improve array size calculation.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>> 2021-13-08  Giuliano Belinassi  <giuliano.belinassi@usp.br>
>>>>
>>>>           PR middle-end/100944
>>>>           * gcc.dg/Wzero-length-array-bounds.c: Update diagnostic.
>>>>           * gcc.dg/Warray-bounds-71.c: New test.
>>>> ---
>>>>    gcc/testsuite/gcc.dg/Warray-bounds-71.c       |  42 +++++++
>>>>    .../gcc.dg/Wzero-length-array-bounds.c        |  18 +--
>>>>    gcc/tree-dfa.c                                |  31 ++++-
>>>>    gcc/tree-dfa.h                                |   6 +-
>>>>    gcc/tree.c                                    | 115 ++++++++++++++----
>>>>    5 files changed, 172 insertions(+), 40 deletions(-)
>>>>    create mode 100644 gcc/testsuite/gcc.dg/Warray-bounds-71.c
>>>>
>>>> diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-71.c b/gcc/testsuite/gcc.dg/Warray-bounds-71.c
>>>> new file mode 100644
>>>> index 00000000000..cc5b083bc77
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/Warray-bounds-71.c
>>>> @@ -0,0 +1,42 @@
>>>> +/* PR middle-end/100944 - missing -Warray-bounds accessing a flexible array
>>>> +   member of a nested struct
>>>> +   { dg-do compile }
>>>> +   { dg-options "-O2 -Wall" } */
>>>> +
>>>> +struct A0
>>>> +{
>>>> +  int i, a[0];
>>>> +};
>>>> +
>>>> +struct B0
>>>> +{
>>>> +  struct A0 a;
>>>> +  long x;
>>>> +} b0;
>>>> +
>>>> +void f0 (int i)
>>>> +{
>>>> +  long t = b0.x;
>>>> +  b0.a.a[i] = 0;    // { dg-warning "\\\[-Warray-bounds" }
>>>> +  if (t != b0.x)    // folded to false
>>>> +    __builtin_abort ();
>>>> +}
>>>> +
>>>> +struct Ax
>>>> +{
>>>> +  int i, a[];
>>>> +};
>>>> +
>>>> +struct Bx
>>>> +{
>>>> +  struct Ax a;
>>>> +  long x;
>>>> +} bx;
>>>> +
>>>> +void fx (int i)
>>>> +{
>>>> +  long t = bx.x;
>>>> +  bx.a.a[i] = 0;    // { dg-warning "\\\[-Warray-bounds" }
>>>> +  if (t != bx.x)    // folded to false
>>>> +    __builtin_abort ();
>>>> +}
>>>> diff --git a/gcc/testsuite/gcc.dg/Wzero-length-array-bounds.c b/gcc/testsuite/gcc.dg/Wzero-length-array-bounds.c
>>>> index 8e880d92dea..117b30ff294 100644
>>>> --- a/gcc/testsuite/gcc.dg/Wzero-length-array-bounds.c
>>>> +++ b/gcc/testsuite/gcc.dg/Wzero-length-array-bounds.c
>>>> @@ -68,21 +68,21 @@ extern struct Y y;
>>>>
>>>>    void access_to_member (int i)
>>>>    {
>>>> -  y.a[0].a[0] = 0;      // { dg-warning "\\\[-Wzero-length-bounds" }
>>>> -  y.a[0].a[1] = 0;      // { dg-warning "\\\[-Wzero-length-bounds" }
>>>> -  y.a[0].a[2] = 0;      // { dg-warning "\\\[-Wzero-length-bounds" }
>>>> +  y.a[0].a[0] = 0;      // { dg-warning "\\\[-Warray-bounds" }
>>>> +  y.a[0].a[1] = 0;      // { dg-warning "\\\[-Warray-bounds" }
>>>> +  y.a[0].a[2] = 0;      // { dg-warning "\\\[-Warray-bounds" }
>>>>      sink (a);
>>>>
>>>> -  y.a[1].a[0] = 0;      // { dg-warning "\\\[-Wzero-length-bounds" }
>>>> -  y.a[1].a[1] = 0;      // { dg-warning "\\\[-Wzero-length-bounds" }
>>>> +  y.a[1].a[0] = 0;      // { dg-warning "\\\[-Warray-bounds" }
>>>> +  y.a[1].a[1] = 0;      // { dg-warning "\\\[-Warray-bounds" }
>>>>      /* Similar to the array case above, accesses to a subsequent member
>>>>         of the "parent" struct seem like a more severe problem than those
>>>>         to the next member of the same struct.  */
>>>> -  y.a[1].a[2] = 0;      // { dg-warning "\\\[-Wzero-length-bounds" }
>>>> +  y.a[1].a[2] = 0;      // { dg-warning "\\\[-Warray-bounds" }
>>>>      sink (a);
>>>>
>>>> -  y.b.a[0] = 0;         // { dg-warning "\\\[-Wzero-length-bounds" }
>>>> -  y.b.a[1] = 0;         // { dg-warning "\\\[-Wzero-length-bounds" }
>>>> -  y.b.a[2] = 0;         // { dg-warning "\\\[-Wzero-length-bounds" }
>>>> +  y.b.a[0] = 0;         // { dg-warning "\\\[-Warray-bounds" }
>>>> +  y.b.a[1] = 0;         // { dg-warning "\\\[-Warray-bounds" }
>>>> +  y.b.a[2] = 0;         // { dg-warning "\\\[-Warray-bounds" }
>>>>      y.b.a[3] = 0;         // { dg-warning "\\\[-Warray-bounds" }
>>>>    }
>>>> diff --git a/gcc/tree-dfa.c b/gcc/tree-dfa.c
>>>> index 1d20de0c400..dc3c15f11d5 100644
>>>> --- a/gcc/tree-dfa.c
>>>> +++ b/gcc/tree-dfa.c
>>>> @@ -772,9 +772,11 @@ get_ref_base_and_extent_hwi (tree exp, HOST_WIDE_INT *poffset,
>>>>
>>>>    tree
>>>>    get_addr_base_and_unit_offset_1 (tree exp, poly_int64_pod *poffset,
>>>> -                                tree (*valueize) (tree))
>>>> +                                tree (*valueize) (tree),
>>>> +                                bool of_next_component /* = false.  */)
>>>>    {
>>>>      poly_int64 byte_offset = 0;
>>>> +  tree next_field = NULL_TREE;
>>>>
>>>>      /* Compute cumulative byte-offset for nested component-refs and array-refs,
>>>>         and find the ultimate containing object.  */
>>>> @@ -797,11 +799,27 @@ get_addr_base_and_unit_offset_1 (tree exp, poly_int64_pod *poffset,
>>>>           case COMPONENT_REF:
>>>>             {
>>>>               tree field = TREE_OPERAND (exp, 1);
>>>> -           tree this_offset = component_ref_field_offset (exp);
>>>> +           tree this_offset;
>>>> +
>>>>               poly_int64 hthis_offset;
>>>>
>>>> +           if (of_next_component && !next_field)
>>>> +             {
>>>> +               /* We are looking for the next component of the record.  */
>>>> +               next_field = TREE_CHAIN (field);
>>>> +               if (!next_field)
>>>> +                 break;
>>>> +
>>>> +               /* We found a next component.  Flag that we found it and
>>>> +                  update the target field.  */
>>>> +               field = next_field;
>>>> +             }
>>>> +
>>>> +           this_offset = component_ref_field_offset (exp);
>>>> +
>>>>               if (!this_offset
>>>>                   || !poly_int_tree_p (this_offset, &hthis_offset)
>>>> +               || TREE_CODE (field) != FIELD_DECL
>>>>                   || (TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (field))
>>>>                       % BITS_PER_UNIT))
>>>>                 return NULL_TREE;
>>>> @@ -904,6 +922,9 @@ get_addr_base_and_unit_offset_1 (tree exp, poly_int64_pod *poffset,
>>>>    done:
>>>>
>>>>      *poffset = byte_offset;
>>>> +
>>>> +  if (of_next_component)
>>>> +    return next_field;
>>>>      return exp;
>>>>    }
>>>>
>>>> @@ -913,9 +934,11 @@ done:
>>>>       is not BITS_PER_UNIT-aligned.  */
>>>>
>>>>    tree
>>>> -get_addr_base_and_unit_offset (tree exp, poly_int64_pod *poffset)
>>>> +get_addr_base_and_unit_offset (tree exp, poly_int64_pod *poffset, bool
>>>> +                              of_next_component /* = false.  */)
>>>>    {
>>>> -  return get_addr_base_and_unit_offset_1 (exp, poffset, NULL);
>>>> +  return get_addr_base_and_unit_offset_1 (exp, poffset, NULL,
>>>> +                                         of_next_component);
>>>>    }
>>>>
>>>>    /* Returns true if STMT references an SSA_NAME that has
>>>> diff --git a/gcc/tree-dfa.h b/gcc/tree-dfa.h
>>>> index b1457ab065c..94e44d9c3f6 100644
>>>> --- a/gcc/tree-dfa.h
>>>> +++ b/gcc/tree-dfa.h
>>>> @@ -34,8 +34,10 @@ extern tree get_ref_base_and_extent (tree, poly_int64_pod *, poly_int64_pod *,
>>>>    extern tree get_ref_base_and_extent_hwi (tree, HOST_WIDE_INT *,
>>>>                                            HOST_WIDE_INT *, bool *);
>>>>    extern tree get_addr_base_and_unit_offset_1 (tree, poly_int64_pod *,
>>>> -                                            tree (*) (tree));
>>>> -extern tree get_addr_base_and_unit_offset (tree, poly_int64_pod *);
>>>> +                                            tree (*) (tree),
>>>> +                                            bool of_next_component = false);
>>>> +extern tree get_addr_base_and_unit_offset (tree, poly_int64_pod *,
>>>> +                                          bool = false);
>>>>    extern bool stmt_references_abnormal_ssa_name (gimple *);
>>>>    extern void replace_abnormal_ssa_names (gimple *);
>>>>    extern void dump_enumerated_decls (FILE *, dump_flags_t);
>>>> diff --git a/gcc/tree.c b/gcc/tree.c
>>>> index 1aa6e557a04..45d7fa2ae92 100644
>>>> --- a/gcc/tree.c
>>>> +++ b/gcc/tree.c
>>>> @@ -12649,6 +12649,47 @@ get_initializer_for (tree init, tree decl)
>>>>      return NULL_TREE;
>>>>    }
>>>>
>>>> +static int
>>>> +least_common_record_1 (tree basetype, tree field1, tree field2,
>>>> +                      tree *least_basetype)
>>>> +{
>>>> +  int ret = 0;
>>>> +
>>>> +  for (tree fld = TYPE_FIELDS (basetype); fld; fld = TREE_CHAIN (fld))
>>>> +    {
>>>> +      if (fld == field1 || fld == field2)
>>>> +       ret++;
>>>> +
>>>> +      if (TREE_CODE (TREE_TYPE (fld)) == UNION_TYPE
>>>> +         || TREE_CODE (TREE_TYPE (fld)) == RECORD_TYPE)
>>>> +       ret += least_common_record_1 (TREE_TYPE (fld), field1, field2,
>>>> +                                     least_basetype);
>>>> +    }
>>>> +
>>>> +  if (ret == 2)
>>>> +    {
>>>> +      *least_basetype = basetype;
>>>> +      ret++; /* Avoid getting in this block again if a common basetype were
>>>> +               found.  */
>>>> +    }
>>>> +
>>>> +  return ret;
>>>> +}
>>>> +
>>>> +/* Find the least common RECORD type common to two FIELDS from base.  */
>>>> +static tree
>>>> +least_common_record (tree basetype, tree field1, tree field2)
>>>> +{
>>>> +  if (!(TREE_CODE (basetype) == RECORD_TYPE
>>>> +       || TREE_CODE (basetype) == UNION_TYPE))
>>>> +    return NULL_TREE;
>>>> +
>>>> +  tree least_basetype = NULL_TREE;
>>>> +  least_common_record_1 (basetype, field1, field2, &least_basetype);
>>>> +
>>>> +  return least_basetype;
>>>> +}
>>>> +
>>>>    /* Determines the size of the member referenced by the COMPONENT_REF
>>>>       REF, using its initializer expression if necessary in order to
>>>>       determine the size of an initialized flexible array member.
>>>> @@ -12768,28 +12809,30 @@ component_ref_size (tree ref, special_array_member *sam /* = NULL */)
>>>>      memsize = NULL_TREE;
>>>>
>>>>      if (typematch)
>>>> -    /* MEMBER is a true flexible array member.  Compute its size from
>>>> -       the initializer of the BASE object if it has one.  */
>>>> -    if (tree init = DECL_P (base) ? DECL_INITIAL (base) : NULL_TREE)
>>>> -      if (init != error_mark_node)
>>>> -       {
>>>> -         init = get_initializer_for (init, member);
>>>> -         if (init)
>>>> -           {
>>>> -             memsize = TYPE_SIZE_UNIT (TREE_TYPE (init));
>>>> -             if (tree refsize = TYPE_SIZE_UNIT (argtype))
>>>> -               {
>>>> -                 /* Use the larger of the initializer size and the tail
>>>> -                    padding in the enclosing struct.  */
>>>> -                 poly_int64 rsz = tree_to_poly_int64 (refsize);
>>>> -                 rsz -= baseoff;
>>>> -                 if (known_lt (tree_to_poly_int64 (memsize), rsz))
>>>> -                   memsize = wide_int_to_tree (TREE_TYPE (memsize), rsz);
>>>> -               }
>>>> -
>>>> -             baseoff = 0;
>>>> -           }
>>>> -       }
>>>> +    {
>>>> +      /* MEMBER is a true flexible array member.  Compute its size from
>>>> +        the initializer of the BASE object if it has one.  */
>>>> +      if (tree init = DECL_P (base) ? DECL_INITIAL (base) : NULL_TREE)
>>>> +       if (init != error_mark_node)
>>>> +         {
>>>> +           init = get_initializer_for (init, member);
>>>> +           if (init)
>>>> +             {
>>>> +               memsize = TYPE_SIZE_UNIT (TREE_TYPE (init));
>>>> +               if (tree refsize = TYPE_SIZE_UNIT (argtype))
>>>> +                 {
>>>> +                   /* Use the larger of the initializer size and the tail
>>>> +                      padding in the enclosing struct.  */
>>>> +                   poly_int64 rsz = tree_to_poly_int64 (refsize);
>>>> +                   rsz -= baseoff;
>>>> +                   if (known_lt (tree_to_poly_int64 (memsize), rsz))
>>>> +                     memsize = wide_int_to_tree (TREE_TYPE (memsize), rsz);
>>>> +                 }
>>>> +
>>>> +               baseoff = 0;
>>>> +             }
>>>> +         }
>>>> +    }
>>>>
>>>>      if (!memsize)
>>>>        {
>>>> @@ -12810,9 +12853,31 @@ component_ref_size (tree ref, special_array_member *sam /* = NULL */)
>>>>             memsize = TYPE_SIZE_UNIT (bt);
>>>>           }
>>>>          else if (DECL_P (base))
>>>> -       /* Use the size of the BASE object (possibly an array of some
>>>> -          other type such as char used to store the struct).  */
>>>> -       memsize = DECL_SIZE_UNIT (base);
>>>> +       {
>>>> +         /* It is possible that the flexible array has no constructor at all.
>>>> +            In such cases, GCC will allocate the array in some weird way.
>>>> +            Assume that the array size is the difference between the start
>>>> +            address of the array and the next component, if it exists.  */
>>>> +
>>>> +         tree lcr;
>>>> +
>>>> +         poly_int64 baseoff2 = 0;
>>>> +         tree next_field = get_addr_base_and_unit_offset (ref, &baseoff2,
>>>> +                                                     /* next_elmnt =  */ true);
>>>> +         if (next_field
>>>> +             && (lcr = least_common_record (basetype, member, next_field))
>>>> +             && TREE_CODE (lcr) == RECORD_TYPE
>>>> +             && known_ge (baseoff2, baseoff))
>>>> +           {
>>>> +               poly_int64 size =  baseoff2 - baseoff;
>>>> +               memsize = wide_int_to_tree (TREE_TYPE (DECL_SIZE_UNIT (base)),
>>>> +                                           size);
>>>> +           }
>>>> +         else
>>>> +           /* Use the size of the BASE object (possibly an array of some
>>>> +              other type such as char used to store the struct).  */
>>>> +           memsize = DECL_SIZE_UNIT (base);
>>>> +       }
>>>>          else
>>>>           return NULL_TREE;
>>>>        }
>>>> --
>>>> 2.32.0
>>>>
>>



More information about the Gcc-patches mailing list