This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] include size and offset in -Wstringop-overflow
On Tue, Nov 12, 2019 at 6:10 AM Jeff Law <law@redhat.com> wrote:
>
> On 11/6/19 3:34 PM, Martin Sebor wrote:
> > On 11/6/19 2:06 PM, Martin Sebor wrote:
> >> On 11/6/19 1:39 PM, Jeff Law wrote:
> >>> On 11/6/19 1:27 PM, Martin Sebor wrote:
> >>>> On 11/6/19 11:55 AM, Jeff Law wrote:
> >>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
> >>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
> >>>>>> stores mention the amount of data being stored and the amount of
> >>>>>> space remaining in the destination, such as:
> >>>>>>
> >>>>>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
> >>>>>>
> >>>>>> 123 | *p = 0;
> >>>>>> | ~~~^~~
> >>>>>> note: destination object declared here
> >>>>>> 45 | char b[N];
> >>>>>> | ^
> >>>>>>
> >>>>>> A warning like this can take some time to analyze. First, the size
> >>>>>> of the destination isn't mentioned and may not be easy to tell from
> >>>>>> the sources. In the note above, when N's value is the result of
> >>>>>> some non-trivial computation, chasing it down may be a small project
> >>>>>> in and of itself. Second, it's also not clear why the region size
> >>>>>> is zero. It could be because the offset is exactly N, or because
> >>>>>> it's negative, or because it's in some range greater than N.
> >>>>>>
> >>>>>> Mentioning both the size of the destination object and the offset
> >>>>>> makes the existing messages clearer, are will become essential when
> >>>>>> GCC starts diagnosing overflow into allocated buffers (as my
> >>>>>> follow-on patch does).
> >>>>>>
> >>>>>> The attached patch enhances -Wstringop-overflow to do this by
> >>>>>> letting compute_objsize return the offset to its caller, doing
> >>>>>> something similar in get_stridx, and adding a new function to
> >>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
> >>>>>> like the function to replace the -Wstringop-overflow handler in
> >>>>>> builtins.c). With the change, the note above might read something
> >>>>>> like:
> >>>>>>
> >>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
> >>>>>> 45 | char b[N];
> >>>>>> | ^
> >>>>>>
> >>>>>> Tested on x86_64-linux.
> >>>>>>
> >>>>>> Martin
> >>>>>>
> >>>>>> gcc-store-offset.diff
> >>>>>>
> >>>>>> gcc/ChangeLog:
> >>>>>>
> >>>>>> * builtins.c (compute_objsize): Add an argument and set it to
> >>>>>> offset
> >>>>>> into destination.
> >>>>>> * builtins.h (compute_objsize): Add an argument.
> >>>>>> * tree-object-size.c (addr_object_size): Add an argument and
> >>>>>> set it
> >>>>>> to offset into destination.
> >>>>>> (compute_builtin_object_size): Same.
> >>>>>> * tree-object-size.h (compute_builtin_object_size): Add an
> >>>>>> argument.
> >>>>>> * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
> >>>>>> set it
> >>>>>> to offset into destination.
> >>>>>> (maybe_warn_overflow): New function.
> >>>>>> (handle_store): Call maybe_warn_overflow to issue warnings.
> >>>>>>
> >>>>>> gcc/testsuite/ChangeLog:
> >>>>>>
> >>>>>> * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
> >>>>>> messages.
> >>>>>> * g++.dg/warn/Wstringop-overflow-3.C: Same.
> >>>>>> * gcc.dg/Wstringop-overflow-17.c: Same.
> >>>>>>
> >>>>>
> >>>>>> Index: gcc/tree-ssa-strlen.c
> >>>>>> ===================================================================
> >>>>>> --- gcc/tree-ssa-strlen.c (revision 277886)
> >>>>>> +++ gcc/tree-ssa-strlen.c (working copy)
> >>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
> >>>>>> static int get_stridx_plus_constant (strinfo *, unsigned
> >>>>>> HOST_WIDE_INT, tree);
> >>>>>> static void handle_builtin_stxncpy (built_in_function,
> >>>>>> gimple_stmt_iterator *);
> >>>>>> +/* Sets MINMAX to either the constant value or the range VAL
> >>>>>> is in
> >>>>>> + and returns true on success. */
> >>>>>> +
> >>>>>> +static bool
> >>>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals =
> >>>>>> NULL)
> >>>>>> +{
> >>>>>> + if (tree_fits_uhwi_p (val))
> >>>>>> + {
> >>>>>> + minmax[0] = minmax[1] = wi::to_wide (val);
> >>>>>> + return true;
> >>>>>> + }
> >>>>>> +
> >>>>>> + if (TREE_CODE (val) != SSA_NAME)
> >>>>>> + return false;
> >>>>>> +
> >>>>>> + if (rvals)
> >>>>>> + {
> >>>>>> + gimple *def = SSA_NAME_DEF_STMT (val);
> >>>>>> + if (gimple_assign_single_p (def)
> >>>>>> + && gimple_assign_rhs_code (def) == INTEGER_CST)
> >>>>>> + {
> >>>>>> + /* get_value_range returns [0, N] for constant
> >>>>>> assignments. */
> >>>>>> + val = gimple_assign_rhs1 (def);
> >>>>>> + minmax[0] = minmax[1] = wi::to_wide (val);
> >>>>>> + return true;
> >>>>>> + }
> >>>>> Umm, something seems really off with this hunk. If the SSA_NAME is
> >>>>> set
> >>>>> via a simple constant assignment, then the range ought to be a
> >>>>> singleton
> >>>>> ie [CONST,CONST]. Is there are particular test were this is not
> >>>>> true?
> >>>>>
> >>>>> The only way offhand I could see this happening is if originally
> >>>>> the RHS
> >>>>> wasn't a constant, but due to optimizations it either simplified
> >>>>> into a
> >>>>> constant or a constant was propagated into an SSA_NAME appearing on
> >>>>> the
> >>>>> RHS. This would have to happen between the last range analysis and
> >>>>> the
> >>>>> point where you're making this query.
> >>>>
> >>>> Yes, I think that's right. Here's an example where it happens:
> >>>>
> >>>> void f (void)
> >>>> {
> >>>> char s[] = "1234";
> >>>> unsigned n = strlen (s);
> >>>> char vla[n]; // or malloc (n)
> >>>> vla[n] = 0; // n = [4, 4]
> >>>> ...
> >>>> }
> >>>>
> >>>> The strlen call is folded to 4 but that's not propagated to
> >>>> the access until sometime after the strlen pass is done.
> >>> Hmm. Are we calling set_range_info in that case? That goes behind the
> >>> back of pass instance of vr_values. If so, that might argue we want to
> >>> be setting it in vr_values too.
> >>
> >> No, set_range_info is only called for ranges. In this case,
> >> handle_builtin_strlen replaces the strlen() call with 4:
> >>
> >> s = "1234";
> >> _1 = __builtin_strlen (&s);
> >> n_2 = (unsigned int) _1;
> >> a.1_8 = __builtin_alloca_with_align (_1, 8);
> >> (*a.1_8)[n_2] = 0;
> >>
> >> When the access is made, the __builtin_alloca_with_align call
> >> is found as the destination and the _1 SSA_NAME is used to
> >> get its size. We get back the range [4, 4].
> >
> > By the way, I glossed over one detail. The above doesn't work
> > exactly as is because the allocation size is the SSA_NAME _1
> > (with the range [4, 4]) but the index is the SSA_NAME n_2 (with
> > the range [0, 4]; the range is [0, 4] because it was set based
> > on the size of the argument to the strlen() call well before
> > the strlen pass even ran).
> Which would tend to argue that we should forward propagate the constant
> to the uses of _1. That should expose that the RHS of the assignment to
> n_2 is a constant as well.
>
>
> >
> > To make it work across assignments we need to propagate the strlen
> > results down the CFG somehow. I'm hoping the on-demand VRP will
> > do this automagically.
> It would, but it's probably more heavyweight than we need. We just need
> to forward propagate the discovered constant to the use points and pick
> up any secondary opportunities that arise.
Yes. And the usual way of doing this is to keep a constant-and-copy
lattice (and for copies you'd need to track availability) and before optimizing
a stmt substitute its operands with the lattice contents.
forwprop has a scheme that can be followed doing a RPO walk, strlen
does a DOM walk, there you can follow what DOM/PRE elimination do
(for tracking copy availability - if you just track constants you can
elide that).
Richard.
> jeff
>