[PATCH] include more detail in -Warray-bounds (PR 86650)
David Malcolm
dmalcolm@redhat.com
Tue Jul 24 17:05:00 GMT 2018
On Mon, 2018-07-23 at 20:56 -0600, Martin Sebor wrote:
> On 07/23/2018 07:20 PM, David Malcolm wrote:
> > On Mon, 2018-07-23 at 17:49 -0600, Martin Sebor wrote:
> > > (David, I'm hoping your your help here. Please see the end.)
> > >
> > > While looking into a recent -Warray-bounds instance in Glibc
> > > involving inlining of large functions it became apparent that
> > > GCC could do a better job of pinpointing the source of
> > > the problem.
> > >
> > > The attached patch makes a few adjustments to both
> > > the pretty printer infrastructure and to VRP to make this
> > > possible. The diagnostic pretty printer already has directives
> > > to print the inlining context for both tree and gcall* arguments,
> > > so most of the changes just adjust things to be able to pass in
> > > gimple* argument instead.
> > >
> > > The only slightly interesting change is to print the declaration
> > > to which the out-of-bounds array refers if one is known.
> > >
> > > Tested on x86_64-linux with one regression.
> > >
> > > The regression is in the gcc.dg/Warray-bounds.c test: the column
> > > numbers of the warnings are off. Adding the %G specifier to
> > > the array bounds warnings in VRP has the unexpected effect of
> > > expanding the extent of the underling. For instance, for a test
> > > case like this:
> > >
> > > int a[10];
> > >
> > > void f (void)
> > > {
> > > a[-1] = 0;
> > > }
> > >
> > > from the expected:
> > >
> > > a[-1] = 0;
> > > ~^~~~
> > >
> > > to this:
> > >
> > > a[-1] = 0;
> > > ~~~~~~^~~
> > >
> > > David, do you have any idea how to avoid this?
> >
> > Are you referring to the the various places in your patch (in e.g.
> > vrp_prop::check_array_ref
> > vrp_prop::check_mem_ref
> > vrp_prop::search_for_addr_array
> > ) where the patch changed things from this form:
> >
> > warning_at (location, OPT_Warray_bounds,
> > "[...format string...]", ARGS...);
> >
> > to this form:
> >
> > warning_at (location, OPT_Warray_bounds,
> > "%G[...format string...]", stmt, ARGS...);
>
> Yes.
>
> >
> > If so, there are two location_t values of interest here:
> > (a) the "location" value, and
> > (b) gimple_location (stmt)
> >
> > My recollection is that %G and %K override the "location" value
> > passed
> > in as the first param to the diagnostic call, overwriting it within
> > the
> > diagnostic_info's text_info with the location value from the %K/%G
> > (which also set up the pp_ti_abstract_origin of the text_info from
> > the
> > block information stashed in the ad-hoc data part of the location,
> > so
> > that the pretty-printer prints the inlining chain).
>
> Would having the pretty printer restore the location and
> the block after it's done printing the context and before
> processing the rest of the format string fix it? (I have
> only a vague idea how this all works so I'm not sure if
> this even makes sense.)
Structurally, it looks like this:
Temporaries during the emission of | Long-lived stuff:
the diagnostic: |
| +---------+
+----------------+ | |global_dc|
|diagnostic_info | | +---------+
|+------------+ | |
||text_info: | | |
|| m_richloc-+--+---> rich_location |
|| x_data----+--+-------------------+--> block (via pp_ti_abstract_origin)
|+------------+ | |
+----------------+ |
|
The location_t of the diagnostic is stored in the rich_location.
Calling:
warning_at (location)
creates a rich_location wrapping "location" and uses it as above.
During formatting, the %K/%G codes set text_info.x_data via
pp_ti_abstract_origin and overwrite the location_t in the
rich_location.
So in theory we could have a format code that sets the block and
doesn't touch the rich_location. But that seems like overkill to me.
> >
> > [aside, why don't we always just print the inlining chain? IIRC,
> > %K
> > and %G feel too much like having to jump through hoops to me, given
> > that gimple_block is looking at gimple_location anyway, why not
> > just
> > use the location in the location_t's ad-hoc data; I have a feeling
> > there's a PR open about this, but I don't have it to hand right
> > now].
>
> That would make sense to me. I think that's also what we
> agreed would be the way forward the last time we discussed
> this.
(nods)
> >
> > It looks like the old location was (a), and now you're seeing (b),
> > since (b) is the location of the statement.
>
> Yes, I think that's it.
>
> >
> > Whether or not this is a problem is difficult to tell; what does
> > the
> > full diagnostic look like? (you only quoted the
> > diagnostic_show_locus
> > part of it).
>
> It looks like this:
>
> e.c: In function âfâ:
> e.c:5:9: warning: array subscript -1 is below array bounds of
> âint[10]â
> [-Warray-bounds]
> a[-1] = 0;
> ~~~~~~^~~
> e.c:1:5: note: while referencing âaâ
> int a[10];
> ^
FWIW, either of the locations/underlining above seem fine to me, but I
feel the wording of the diagnostic could be improved. When I first
read:
array subscript -1 is below array bounds of âint[10]â
I found myself pausing to think about the message; the "10" in the
message confused me for a moment, and then I realized that it's
referring to the *lower* bound of the array. I don't know how
"standardese" refers to this, but I think about it as the "start" of
the array. So maybe the messages could read something like:
out of bounds access to array âint[10]â: subscript -1 is before the
start of the array
or something. In a similar vein, is it easy to distinguish reads from
writes, which might give:
out of bounds write to array âint[10]â: subscript -1 writes to before
the start of the array
vs e.g.:
out of bounds read from array 'int[10]': subscript 10 reads from
after 'a[9]', the final element of the array
and similar (I'm brainstorming here).
Dave
More information about the Gcc-patches
mailing list