[PATCH] include more detail in -Warray-bounds (PR 86650)

Martin Sebor msebor@gmail.com
Tue Jul 24 19:31:00 GMT 2018


On 07/24/2018 11:05 AM, David Malcolm wrote:
> On Mon, 2018-07-23 at 20:56 -0600, Martin Sebor wrote:
>> On 07/23/2018 07:20 PM, David Malcolm wrote:
>>> On Mon, 2018-07-23 at 17:49 -0600, Martin Sebor wrote:
>>>> (David, I'm hoping your your help here.  Please see the end.)
>>>>
>>>> While looking into a recent -Warray-bounds instance in Glibc
>>>> involving inlining of large functions it became apparent that
>>>> GCC could do a better job of pinpointing the source of
>>>> the problem.
>>>>
>>>> The attached patch makes a few adjustments to both
>>>> the pretty printer infrastructure and to VRP to make this
>>>> possible.  The diagnostic pretty printer already has directives
>>>> to print the inlining context for both tree and gcall* arguments,
>>>> so most of the changes just adjust things to be able to pass in
>>>> gimple* argument instead.
>>>>
>>>> The only slightly interesting change is to print the declaration
>>>> to which the out-of-bounds array refers if one is known.
>>>>
>>>> Tested on x86_64-linux with one regression.
>>>>
>>>> The regression is in the gcc.dg/Warray-bounds.c test: the column
>>>> numbers of the warnings are off.  Adding the %G specifier to
>>>> the array bounds warnings in VRP has the unexpected effect of
>>>> expanding the extent of the underling. For instance, for a test
>>>> case like this:
>>>>
>>>>    int a[10];
>>>>
>>>>    void f (void)
>>>>    {
>>>>      a[-1] = 0;
>>>>    }
>>>>
>>>> from the expected:
>>>>
>>>>     a[-1] = 0;
>>>>     ~^~~~
>>>>
>>>> to this:
>>>>
>>>>    a[-1] = 0;
>>>>     ~~~~~~^~~
>>>>
>>>> David, do you have any idea how to avoid this?
>>>
>>> Are you referring to the the various places in your patch (in e.g.
>>>   vrp_prop::check_array_ref
>>>   vrp_prop::check_mem_ref
>>>   vrp_prop::search_for_addr_array
>>> ) where the patch changed things from this form:
>>>
>>>   warning_at (location, OPT_Warray_bounds,
>>>               "[...format string...]", ARGS...);
>>>
>>> to this form:
>>>
>>>   warning_at (location, OPT_Warray_bounds,
>>>               "%G[...format string...]", stmt, ARGS...);
>>
>> Yes.
>>
>>>
>>> If so, there are two location_t values of interest here:
>>> (a) the "location" value, and
>>> (b) gimple_location (stmt)
>>>
>>> My recollection is that %G and %K override the "location" value
>>> passed
>>> in as the first param to the diagnostic call, overwriting it within
>>> the
>>> diagnostic_info's text_info with the location value from the %K/%G
>>> (which also set up the pp_ti_abstract_origin of the text_info from
>>> the
>>> block information stashed in the ad-hoc data part of the location,
>>> so
>>> that the pretty-printer prints the inlining chain).
>>
>> Would having the pretty printer restore the location and
>> the block after it's done printing the context and before
>> processing the rest of the format string fix it?  (I have
>> only a vague idea how this all works so I'm not sure if
>> this even makes sense.)
>
> Structurally, it looks like this:
>
> Temporaries during the emission of   |  Long-lived stuff:
> the diagnostic:                      |
>                                      |    +---------+
> +----------------+                   |    |global_dc|
> |diagnostic_info |                   |    +---------+
> |+------------+  |                   |
> ||text_info:  |  |                   |
> ||  m_richloc-+--+---> rich_location |
> ||  x_data----+--+-------------------+--> block (via pp_ti_abstract_origin)
> |+------------+  |                   |
> +----------------+                   |
>                                      |
>
> The location_t of the diagnostic is stored in the rich_location.
>
> Calling:
>   warning_at (location)
> creates a rich_location wrapping "location" and uses it as above.
>
> During formatting, the %K/%G codes set text_info.x_data via
> pp_ti_abstract_origin and overwrite the location_t in the
> rich_location.
>
> So in theory we could have a format code that sets the block and
> doesn't touch the rich_location.  But that seems like overkill to me.

I wasn't thinking of a new format.  Rather, I thought the %K
would save the current block and location (set by the location
argument to warning_at), then after printing the inlining stack
but before printing the rest of the diagnostic the printer would
restore the saved block and location.  I still don't know enough
to tell if it would work.

In any event, if it's easier to always print the inlining stack
and get rid of %K and %G then that would be preferable.  I don't
think they are used for any other purpose (i.e., they are always
used as the first directive in a format string).

>>> [aside, why don't we always just print the inlining chain?  IIRC,
>>> %K
>>> and %G feel too much like having to jump through hoops to me, given
>>> that gimple_block is looking at gimple_location anyway, why not
>>> just
>>> use the location in the location_t's ad-hoc data; I have a feeling
>>> there's a PR open about this, but I don't have it to hand right
>>> now].
>>
>> That would make sense to me.  I think that's also what we
>> agreed would be the way forward the last time we discussed
>> this.
>
> (nods)

So how do we go about making this happen?  Somewhat selfishly
I was sort of waiting for you to take the lead on it since
you're much more familiar with the code than I am :)  But
that doesn't mean I can't try to tackle it myself.  If it seems
like something you can fit into your schedule sometime soonish
let me know.  Otherwise I'll see if I can figure it out myself.
In the meantime, I'll proceed with the rest of this patch sans
the %G bits.

>>> It looks like the old location was (a), and now you're seeing (b),
>>> since (b) is the location of the statement.
>>
>> Yes, I think that's it.
>>
>>>
>>> Whether or not this is a problem is difficult to tell; what does
>>> the
>>> full diagnostic look like? (you only quoted the
>>> diagnostic_show_locus
>>> part of it).
>>
>> It looks like this:
>>
>> e.c: In function ‘f’:
>> e.c:5:9: warning: array subscript -1 is below array bounds of
>> ‘int[10]’
>> [-Warray-bounds]
>>     a[-1] = 0;
>>     ~~~~~~^~~
>> e.c:1:5: note: while referencing ‘a’
>>   int a[10];
>>       ^
>
> FWIW, either of the locations/underlining above seem fine to me,

The underling in the warning isn't right.  It doesn't matter so
much in the straightforward case of a one-dimensional array but
it becomes a problem when the warning involves a mult-dimensional
array.  Contrast the output for this code:

   const int i0 = 5, k0 = 0, i1 = 0, k1 = 5;

   int a[2][3];

   void f (void)
   {
     a[i0][k0] = 0;
     a[i1][k1] = 0;
   }

With my patch (without the note) we get:

   e.c:7:13: warning: array subscript 5 is above array bounds of 
‘int[2][3]’ [-Warray-bounds]
      a[i0][k0] = 0;
      ~~~~~~~~~~^~~
   e.c:8:13: warning: array subscript 5 is above array bounds of 
‘int[3]’ [-Warray-bounds]
      a[i1][k1] = 0;
      ~~~~~~~~~~^~~

when without the patch we get:

   e.c:7:4: warning: array subscript 5 is above array bounds of 
‘int[2][3]’ [-Warray-bounds]
      a[i0][k0] = 0;
      ~^~~~
   e.c:8:8: warning: array subscript 5 is above array bounds of ‘int[3]’ 
[-Warray-bounds]
      a[i1][k1] = 0;
      ~~~~~^~~~

With the patch the warning doesn't clearly indicate which of
the two indices is out of bounds.  I don't think we should
lose that feature.

> but I
> feel the wording of the diagnostic could be improved.  When I first
> read:
>
>   array subscript -1 is below array bounds of ‘int[10]’
>
> I found myself pausing to think about the message; the "10" in the
> message confused me for a moment, and then I realized that it's
> referring to the *lower* bound of the array.  I don't know how
> "standardese" refers to this, but I think about it as the "start" of
> the array.  So maybe the messages could read something like:
>
>   out of bounds access to array ‘int[10]’: subscript -1 is before the
> start of the array

I agree that the warnings could be tweaked.  With the inclusion
of the value of the out-of-bounds index (or its range) and with
the type of the referenced array in the diagnostic it's no longer
necessary to mention whether the index is above or below bounds.
It's enough to say that it's out of bounds.  It will not only
obviate questions like the one above but also simplify the code
(and also simplify writing tests).

>
> or something.  In a similar vein, is it easy to distinguish reads from
> writes, which might give:
>
>   out of bounds write to array ‘int[10]’: subscript -1 writes to before
> the start of the array
>
> vs e.g.:
>
>   out of bounds read from array 'int[10]': subscript 10 reads from
> after 'a[9]', the final element of the array
>
> and similar (I'm brainstorming here).

Thanks for your comments -- they're always helpful!

Martin



More information about the Gcc-patches mailing list