[PATCH] enhance -Warray-bounds to handle strings and excessive indices

Fri Oct 20 08:12:00 GMT 2017

On Fri, Oct 20, 2017 at 1:00 AM, Martin Sebor <msebor@gmail.com> wrote:
> On 10/19/2017 02:34 AM, Richard Biener wrote:
>>
>> On Thu, Oct 19, 2017 at 1:19 AM, Martin Sebor <msebor@gmail.com> wrote:
>>>
>>> On 10/18/2017 04:48 AM, Richard Biener wrote:
>>>>
>>>>
>>>> On Wed, Oct 18, 2017 at 5:34 AM, Martin Sebor <msebor@gmail.com> wrote:
>>>>>
>>>>>
>>>>> While testing my latest -Wrestrict changes I noticed a number of
>>>>> opportunities to improve the -Warray-bounds warning.  Attached
>>>>> is a patch that implements a solution for the following subset
>>>>> of these:
>>>>>
>>>>> PR tree-optimization/82596 - missing -Warray-bounds on an out-of
>>>>>   bounds index into string literal
>>>>> PR tree-optimization/82588 - missing -Warray-bounds on an excessively
>>>>>   large index
>>>>> PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds
>>>>>   inner indices
>
>
>> I meant to use size_type_node (size_t), not sizetype.  But
>> I just checked that ptrdiff_type_node is initialized in
>> build_common_tree_nodes and thus always available.
>
>
> I see.  Using ptrdiff_type_node is preferable for the targets
> where ptrdiff_t has a greater precision than size_t (e.g., VMS).
> It makes sense now.  I should remember to change all the other
> places where I introduced ssizetype to use ptrdiff_type_node.
>
>>
>>> As an aside, at some point I would like to get away from a type
>>> based limit in all these warnings and instead use one that can
>>> be controlled by an option so that a user can impose a lower limit
>>> on the maximum size of an object and have all size-related warnings
>>> (and perhaps even optimizations) enforce it and benefit from it.
>>
>>
>> You could add a --param that is initialized from ptrdiff_type_node.
>
>
> Yes, that's an option to consider.  Thanks.
>
>
>>
>>>> +      tree arg = TREE_OPERAND (ref, 0);
>>>> +      tree_code code = TREE_CODE (arg);
>>>> +      if (code == COMPONENT_REF)
>>>> +       {
>>>> +         HOST_WIDE_INT off;
>>>> +         if (tree base = get_addr_base_and_unit_offset (ref, &off))
>>>> +           up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype,
>>>> up_bound_p1,
>>>> +                                      TYPE_SIZE_UNIT (TREE_TYPE
>>>> (base)));
>>>> +         else
>>>> +           return;
>>>>
>>>> so this gives up on a.b[i].c.d[k] (ok, array_at_struct_end_p will be
>>>> false).
>>>> simply not subtracting anyhing instead of returning would be
>>>> conservatively
>>>> correct, no?  Likewise subtracting the offset of the array for all
>>>> "previous"
>>>> variably indexed components with assuming the lowest value for the
>>>> index.
>>>> But as above I think compensating for the offset of the array within the
>>>> object
>>>> is academic ... ;)
>>>
>>>
>>>
>>> I was going to say yes (it gives up) but on second thought I don't
>>> think it does.  Only the major index can be unbounded and the code
>>> does consider the size of the sub-array when checking the major
>>> index.  So, IIUC, I think this works correctly as is (*).  What
>>> doesn't work is VLAs but those are a separate problem.  Let me
>>> know if I misunderstood your question.
>>
>>
>> get_addr_base_and_unit_offset will return NULL if there's any variable
>> component in 'ref'.  So as written it seems to be dead code (you
>> want to pass 'arg'?)
>
>
> Sorry, I'm not sure I understand what you mean.  What do you think
> is dead code?  The call to get_addr_base_and_unit_offset() is also
> made for an array of unspecified bound (up_bound is null) and for
> an array at the end of a struct.  For those the function returns
> non-null, and for the others (arrays of runtime bound) it returns
> null.  (I passed arg instead of ref but I see no difference in
> my tests.)

If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg') it will
return the offset of 'c'.  If you pass a.b[j].c it will still return NULL.
You could use get_ref_base_and_extent which will return the offset
of a.b[0].c in this case and sets max_size != size - but you are only
interested in offset.  The disadvantage of get_ref_base_and_extent
is it returns offset in bits thus if the offset is too large for a HWI
you'll instead get offset == 0 and max_size == -1.

Thus I'm saying this is dead code for variable array accesses
(even for the array you are warning about).  Yes, for constant index
and at-struct-end you'll get sth, but the warning is in VRP because
of variable indexes.

So I suggest to pass 'arg' and use get_ref_base_and_extent
for some extra precision (and possible lossage for very very large
structures).

Thus instead of

+      tree maxbound = TYPE_MAX_VALUE (ptrdiff_type_node);
+
+      up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize);
+
+      tree arg = TREE_OPERAND (ref, 0);
+      tree_code code = TREE_CODE (arg);
+      if (code == COMPONENT_REF)
+       {
+         HOST_WIDE_INT off;
+         if (tree base = get_addr_base_and_unit_offset (arg, &off))
+           {
+             tree size = TYPE_SIZE_UNIT (TREE_TYPE (base));

(not sure why you're subtracting the size of the base here?!  I
thought you subtract off!)

+             if (TREE_CODE (size) == INTEGER_CST)
+               up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size);
+           }

do

    if (handled_component_p (arg))
      {
         HOST_WIDE_INT bitoff, bitsize, bitmax_size;
         bool reverse;
         get_ref_base_and_extent (arg, &bitoff, &bitsize,
&bitmax_size, &reverse);
         up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1,
                              build_int_cst (TREE_TYPE (up_bound_p1),
bitoff / BITS_PER_UNIT));
      }

ok with that change.

Richard.

>
>>
>> I was asking you to remove the 'else return' because w/o subtracting
>> the upper bound is just more conservative.
>
>
> Sure, that sounds good.
>
> Attached is an updated patch.
>
> Thanks
> Martin