[PATCH] include more detail in -Warray-bounds (PR 86650)

Tue Jul 24 17:05:00 GMT 2018

On Mon, 2018-07-23 at 20:56 -0600, Martin Sebor wrote:
> On 07/23/2018 07:20 PM, David Malcolm wrote:
> > On Mon, 2018-07-23 at 17:49 -0600, Martin Sebor wrote:
> > > (David, I'm hoping your your help here.  Please see the end.)
> > > 
> > > While looking into a recent -Warray-bounds instance in Glibc
> > > involving inlining of large functions it became apparent that
> > > GCC could do a better job of pinpointing the source of
> > > the problem.
> > > 
> > > The attached patch makes a few adjustments to both
> > > the pretty printer infrastructure and to VRP to make this
> > > possible.  The diagnostic pretty printer already has directives
> > > to print the inlining context for both tree and gcall* arguments,
> > > so most of the changes just adjust things to be able to pass in
> > > gimple* argument instead.
> > > 
> > > The only slightly interesting change is to print the declaration
> > > to which the out-of-bounds array refers if one is known.
> > > 
> > > Tested on x86_64-linux with one regression.
> > > 
> > > The regression is in the gcc.dg/Warray-bounds.c test: the column
> > > numbers of the warnings are off.  Adding the %G specifier to
> > > the array bounds warnings in VRP has the unexpected effect of
> > > expanding the extent of the underling. For instance, for a test
> > > case like this:
> > > 
> > >    int a[10];
> > > 
> > >    void f (void)
> > >    {
> > >      a[-1] = 0;
> > >    }
> > > 
> > > from the expected:
> > > 
> > >     a[-1] = 0;
> > >     ~^~~~
> > > 
> > > to this:
> > > 
> > >    a[-1] = 0;
> > >     ~~~~~~^~~
> > > 
> > > David, do you have any idea how to avoid this?
> > 
> > Are you referring to the the various places in your patch (in e.g.
> >   vrp_prop::check_array_ref
> >   vrp_prop::check_mem_ref
> >   vrp_prop::search_for_addr_array
> > ) where the patch changed things from this form:
> > 
> >   warning_at (location, OPT_Warray_bounds,
> >               "[...format string...]", ARGS...);
> > 
> > to this form:
> > 
> >   warning_at (location, OPT_Warray_bounds,
> >               "%G[...format string...]", stmt, ARGS...);
> 
> Yes.
> 
> > 
> > If so, there are two location_t values of interest here:
> > (a) the "location" value, and
> > (b) gimple_location (stmt)
> > 
> > My recollection is that %G and %K override the "location" value
> > passed
> > in as the first param to the diagnostic call, overwriting it within
> > the
> > diagnostic_info's text_info with the location value from the %K/%G
> > (which also set up the pp_ti_abstract_origin of the text_info from
> > the
> > block information stashed in the ad-hoc data part of the location,
> > so
> > that the pretty-printer prints the inlining chain).
> 
> Would having the pretty printer restore the location and
> the block after it's done printing the context and before
> processing the rest of the format string fix it?  (I have
> only a vague idea how this all works so I'm not sure if
> this even makes sense.)

Structurally, it looks like this:

Temporaries during the emission of   |  Long-lived stuff:
the diagnostic:                      |
                                     |    +---------+
+----------------+                   |    |global_dc|
|diagnostic_info |                   |    +---------+
|+------------+  |                   |
||text_info:  |  |                   |
||  m_richloc-+--+---> rich_location |
||  x_data----+--+-------------------+--> block (via pp_ti_abstract_origin)
|+------------+  |                   |
+----------------+                   |
                                     |

The location_t of the diagnostic is stored in the rich_location.

Calling:
  warning_at (location)
creates a rich_location wrapping "location" and uses it as above.

During formatting, the %K/%G codes set text_info.x_data via
pp_ti_abstract_origin and overwrite the location_t in the
rich_location.

So in theory we could have a format code that sets the block and
doesn't touch the rich_location.  But that seems like overkill to me.

> > 
> > [aside, why don't we always just print the inlining chain?  IIRC,
> > %K
> > and %G feel too much like having to jump through hoops to me, given
> > that gimple_block is looking at gimple_location anyway, why not
> > just
> > use the location in the location_t's ad-hoc data; I have a feeling
> > there's a PR open about this, but I don't have it to hand right
> > now].
> 
> That would make sense to me.  I think that's also what we
> agreed would be the way forward the last time we discussed
> this.

(nods)

> > 
> > It looks like the old location was (a), and now you're seeing (b),
> > since (b) is the location of the statement.
> 
> Yes, I think that's it.
> 
> > 
> > Whether or not this is a problem is difficult to tell; what does
> > the
> > full diagnostic look like? (you only quoted the
> > diagnostic_show_locus
> > part of it).
> 
> It looks like this:
> 
> e.c: In function â€˜fâ€™:
> e.c:5:9: warning: array subscript -1 is below array bounds of
> â€˜int[10]â€™ 
> [-Warray-bounds]
>     a[-1] = 0;
>     ~~~~~~^~~
> e.c:1:5: note: while referencing â€˜aâ€™
>   int a[10];
>       ^

FWIW, either of the locations/underlining above seem fine to me, but I
feel the wording of the diagnostic could be improved.  When I first
read:

  array subscript -1 is below array bounds of â€˜int[10]â€™  

I found myself pausing to think about the message; the "10" in the
message confused me for a moment, and then I realized that it's
referring to the *lower* bound of the array.  I don't know how
"standardese" refers to this, but I think about it as the "start" of
the array.  So maybe the messages could read something like:

  out of bounds access to array â€˜int[10]â€™: subscript -1 is before the
start of the array

or something.  In a similar vein, is it easy to distinguish reads from
writes, which might give:

  out of bounds write to array â€˜int[10]â€™: subscript -1 writes to before
the start of the array

vs e.g.:

  out of bounds read from array 'int[10]': subscript 10 reads from
after 'a[9]', the final element of the array

and similar (I'm brainstorming here).

Dave