[PATCH] include size and offset in -Wstringop-overflow

Richard Biener richard.guenther@gmail.com
Wed Nov 13 14:39:00 GMT 2019


On Tue, Nov 12, 2019 at 6:55 PM Jeff Law <law@redhat.com> wrote:
>
> On 11/12/19 1:15 AM, Richard Biener wrote:
> > On Tue, Nov 12, 2019 at 6:10 AM Jeff Law <law@redhat.com> wrote:
> >>
> >> On 11/6/19 3:34 PM, Martin Sebor wrote:
> >>> On 11/6/19 2:06 PM, Martin Sebor wrote:
> >>>> On 11/6/19 1:39 PM, Jeff Law wrote:
> >>>>> On 11/6/19 1:27 PM, Martin Sebor wrote:
> >>>>>> On 11/6/19 11:55 AM, Jeff Law wrote:
> >>>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
> >>>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
> >>>>>>>> stores mention the amount of data being stored and the amount of
> >>>>>>>> space remaining in the destination, such as:
> >>>>>>>>
> >>>>>>>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
> >>>>>>>>
> >>>>>>>>     123 |   *p = 0;
> >>>>>>>>         |   ~~~^~~
> >>>>>>>> note: destination object declared here
> >>>>>>>>      45 |   char b[N];
> >>>>>>>>         |        ^
> >>>>>>>>
> >>>>>>>> A warning like this can take some time to analyze.  First, the size
> >>>>>>>> of the destination isn't mentioned and may not be easy to tell from
> >>>>>>>> the sources.  In the note above, when N's value is the result of
> >>>>>>>> some non-trivial computation, chasing it down may be a small project
> >>>>>>>> in and of itself.  Second, it's also not clear why the region size
> >>>>>>>> is zero.  It could be because the offset is exactly N, or because
> >>>>>>>> it's negative, or because it's in some range greater than N.
> >>>>>>>>
> >>>>>>>> Mentioning both the size of the destination object and the offset
> >>>>>>>> makes the existing messages clearer, are will become essential when
> >>>>>>>> GCC starts diagnosing overflow into allocated buffers (as my
> >>>>>>>> follow-on patch does).
> >>>>>>>>
> >>>>>>>> The attached patch enhances -Wstringop-overflow to do this by
> >>>>>>>> letting compute_objsize return the offset to its caller, doing
> >>>>>>>> something similar in get_stridx, and adding a new function to
> >>>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
> >>>>>>>> like the function to replace the -Wstringop-overflow handler in
> >>>>>>>> builtins.c).  With the change, the note above might read something
> >>>>>>>> like:
> >>>>>>>>
> >>>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
> >>>>>>>>      45 |   char b[N];
> >>>>>>>>         |        ^
> >>>>>>>>
> >>>>>>>> Tested on x86_64-linux.
> >>>>>>>>
> >>>>>>>> Martin
> >>>>>>>>
> >>>>>>>> gcc-store-offset.diff
> >>>>>>>>
> >>>>>>>> gcc/ChangeLog:
> >>>>>>>>
> >>>>>>>>      * builtins.c (compute_objsize): Add an argument and set it to
> >>>>>>>> offset
> >>>>>>>>      into destination.
> >>>>>>>>      * builtins.h (compute_objsize): Add an argument.
> >>>>>>>>      * tree-object-size.c (addr_object_size): Add an argument and
> >>>>>>>> set it
> >>>>>>>>      to offset into destination.
> >>>>>>>>      (compute_builtin_object_size): Same.
> >>>>>>>>      * tree-object-size.h (compute_builtin_object_size): Add an
> >>>>>>>> argument.
> >>>>>>>>      * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
> >>>>>>>> set it
> >>>>>>>>      to offset into destination.
> >>>>>>>>      (maybe_warn_overflow): New function.
> >>>>>>>>      (handle_store): Call maybe_warn_overflow to issue warnings.
> >>>>>>>>
> >>>>>>>> gcc/testsuite/ChangeLog:
> >>>>>>>>
> >>>>>>>>      * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
> >>>>>>>> messages.
> >>>>>>>>      * g++.dg/warn/Wstringop-overflow-3.C: Same.
> >>>>>>>>      * gcc.dg/Wstringop-overflow-17.c: Same.
> >>>>>>>>
> >>>>>>>
> >>>>>>>> Index: gcc/tree-ssa-strlen.c
> >>>>>>>> ===================================================================
> >>>>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
> >>>>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
> >>>>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
> >>>>>>>>    static int get_stridx_plus_constant (strinfo *, unsigned
> >>>>>>>> HOST_WIDE_INT, tree);
> >>>>>>>>    static void handle_builtin_stxncpy (built_in_function,
> >>>>>>>> gimple_stmt_iterator *);
> >>>>>>>>    +/* Sets MINMAX to either the constant value or the range VAL
> >>>>>>>> is in
> >>>>>>>> +   and returns true on success.  */
> >>>>>>>> +
> >>>>>>>> +static bool
> >>>>>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals =
> >>>>>>>> NULL)
> >>>>>>>> +{
> >>>>>>>> +  if (tree_fits_uhwi_p (val))
> >>>>>>>> +    {
> >>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
> >>>>>>>> +      return true;
> >>>>>>>> +    }
> >>>>>>>> +
> >>>>>>>> +  if (TREE_CODE (val) != SSA_NAME)
> >>>>>>>> +    return false;
> >>>>>>>> +
> >>>>>>>> +  if (rvals)
> >>>>>>>> +    {
> >>>>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
> >>>>>>>> +      if (gimple_assign_single_p (def)
> >>>>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
> >>>>>>>> +    {
> >>>>>>>> +      /* get_value_range returns [0, N] for constant
> >>>>>>>> assignments.  */
> >>>>>>>> +      val = gimple_assign_rhs1 (def);
> >>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
> >>>>>>>> +      return true;
> >>>>>>>> +    }
> >>>>>>> Umm, something seems really off with this hunk.  If the SSA_NAME is
> >>>>>>> set
> >>>>>>> via a simple constant assignment, then the range ought to be a
> >>>>>>> singleton
> >>>>>>> ie [CONST,CONST].   Is there are particular test were this is not
> >>>>>>> true?
> >>>>>>>
> >>>>>>> The only way offhand I could see this happening is if originally
> >>>>>>> the RHS
> >>>>>>> wasn't a constant, but due to optimizations it either simplified
> >>>>>>> into a
> >>>>>>> constant or a constant was propagated into an SSA_NAME appearing on
> >>>>>>> the
> >>>>>>> RHS.  This would have to happen between the last range analysis and
> >>>>>>> the
> >>>>>>> point where you're making this query.
> >>>>>>
> >>>>>> Yes, I think that's right.  Here's an example where it happens:
> >>>>>>
> >>>>>>    void f (void)
> >>>>>>    {
> >>>>>>      char s[] = "1234";
> >>>>>>      unsigned n = strlen (s);
> >>>>>>      char vla[n];   // or malloc (n)
> >>>>>>      vla[n] = 0;    // n = [4, 4]
> >>>>>>      ...
> >>>>>>    }
> >>>>>>
> >>>>>> The strlen call is folded to 4 but that's not propagated to
> >>>>>> the access until sometime after the strlen pass is done.
> >>>>> Hmm.  Are we calling set_range_info in that case?  That goes behind the
> >>>>> back of pass instance of vr_values.  If so, that might argue we want to
> >>>>> be setting it in vr_values too.
> >>>>
> >>>> No, set_range_info is only called for ranges.  In this case,
> >>>> handle_builtin_strlen replaces the strlen() call with 4:
> >>>>
> >>>>    s = "1234";
> >>>>    _1 = __builtin_strlen (&s);
> >>>>    n_2 = (unsigned int) _1;
> >>>>    a.1_8 = __builtin_alloca_with_align (_1, 8);
> >>>>    (*a.1_8)[n_2] = 0;
> >>>>
> >>>> When the access is made, the __builtin_alloca_with_align call
> >>>> is found as the destination and the _1 SSA_NAME is used to
> >>>> get its size.  We get back the range [4, 4].
> >>>
> >>> By the way, I glossed over one detail.  The above doesn't work
> >>> exactly as is because the allocation size is the SSA_NAME _1
> >>> (with the range [4, 4]) but the index is the SSA_NAME n_2 (with
> >>> the range [0, 4]; the range is [0, 4] because it was set based
> >>> on the size of the argument to the strlen() call well before
> >>> the strlen pass even ran).
> >> Which would tend to argue that we should forward propagate the constant
> >> to the uses of _1.  That should expose that the RHS of the assignment to
> >> n_2 is a constant as well.
> >>
> >>
> >>>
> >>> To make it work across assignments we need to propagate the strlen
> >>> results down the CFG somehow.  I'm hoping the on-demand VRP will
> >>> do this automagically.
> >> It would, but it's probably more heavyweight than we need.  We just need
> >> to forward propagate the discovered constant to the use points and pick
> >> up any secondary opportunities that arise.
> >
> > Yes.  And the usual way of doing this is to keep a constant-and-copy
> > lattice (and for copies you'd need to track availability) and before optimizing
> > a stmt substitute its operands with the lattice contents.
> >
> > forwprop has a scheme that can be followed doing a RPO walk, strlen
> > does a DOM walk, there you can follow what DOM/PRE elimination do
> > (for tracking copy availability - if you just track constants you can
> > elide that).
> I'm also note sure handling copies is all that interesting here and if
> we just handle constants/invariants, then it's pretty easy.
>
> Whenever we replace a strlen call with a const, we put the LHS (assuming
> its an SSA_NAME) of the statement on a worklist.
>
> We pull items off the worklist and propagate the value to the use points
> and try to simplify the resulting statement.  If the RHS of the use
> point simplified to a constant, then put the LHS of the use statement
> onto the worklist.  Iterate until the list is empty.
>
> That would capture everything of interest I suspect and ought to be cheap.

That's the ad-hoc way, yes.

To Martin: no, this shouldn't be a prerequesite

> jeff
>
>
>
> I think whenever we substitute a constant or SSA_NAME for a strlen call,
> we can just replace uses of the LHS of the assignment with the
> const/copy.  Any statements we propagate into are put on a worklist.
>
> We pull statements off the worklist and try to simplify their RHS.  If
> the RHS simplifies to a const/copy, then then we repeat the process of
> propagating to the use points and
>
> x = <whatever>;
>
> If we find <whatever> collapses to a constant or copy we can just record
> it in SSA_NAME_VALUE.  As we walk through statements, we can propagate
> >
> > Richard.
> >
> >> jeff
> >>
> >
>



More information about the Gcc-patches mailing list