This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)


On Tue, Aug 28, 2018 at 6:27 AM Jeff Law <law@redhat.com> wrote:
>
> On 08/27/2018 10:27 AM, Martin Sebor wrote:
> > On 08/27/2018 02:29 AM, Richard Biener wrote:
> >> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
> >>>
> >>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
> >>>> The warning suppression for -Wstringop-truncation looks for
> >>>> the next statement after a truncating strncpy to see if it
> >>>> adds a terminating nul.  This only works when the next
> >>>> statement can be reached using the Gimple statement iterator
> >>>> which isn't until after gimplification.  As a result, strncpy
> >>>> calls that truncate their constant argument that are being
> >>>> folded to memcpy this early get diagnosed even if they are
> >>>> followed by the nul assignment:
> >>>>
> >>>>   const char s[] = "12345";
> >>>>   char d[3];
> >>>>
> >>>>   void f (void)
> >>>>   {
> >>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
> >>>>     d[sizeof d - 1] = 0;
> >>>>   }
> >>>>
> >>>> To avoid the warning I propose to defer folding strncpy to
> >>>> memcpy until the pointer to the basic block the strnpy call
> >>>> is in can be used to try to reach the next statement (this
> >>>> happens as early as ccp1).  I'm aware of the preference to
> >>>> fold things early but in the case of strncpy (a relatively
> >>>> rarely used function that is often misused), getting
> >>>> the warning right while folding a bit later but still fairly
> >>>> early on seems like a reasonable compromise.  I fear that
> >>>> otherwise, the false positives will drive users to adopt
> >>>> other unsafe solutions (like memcpy) where these kinds of
> >>>> bugs cannot be as readily detected.
> >>>>
> >>>> Tested on x86_64-linux.
> >>>>
> >>>> Martin
> >>>>
> >>>> PS There still are outstanding cases where the warning can
> >>>> be avoided.  I xfailed them in the test for now but will
> >>>> still try to get them to work for GCC 9.
> >>>>
> >>>> gcc-87028.diff
> >>>>
> >>>>
> >>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
> >>>> strncpy with global variable source string
> >>>> gcc/ChangeLog:
> >>>>
> >>>>       PR tree-optimization/87028
> >>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
> >>>>       statement doesn't belong to a basic block.
> >>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
> >>>>       the left hand side of assignment.
> >>>>
> >>>> gcc/testsuite/ChangeLog:
> >>>>
> >>>>       PR tree-optimization/87028
> >>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
> >>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
> >>>>
> >>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> >>>> index 07341eb..284c2fb 100644
> >>>> --- a/gcc/gimple-fold.c
> >>>> +++ b/gcc/gimple-fold.c
> >>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
> >>>> (gimple_stmt_iterator *gsi,
> >>>>    if (tree_int_cst_lt (ssize, len))
> >>>>      return false;
> >>>>
> >>>> +  /* Defer warning (and folding) until the next statement in the basic
> >>>> +     block is reachable.  */
> >>>> +  if (!gimple_bb (stmt))
> >>>> +    return false;
> >>> I think you want cfun->cfg as the test here.  They should be equivalent
> >>> in practice.
> >>
> >> Please do not add 'cfun' references.  Note that the next stmt is also
> >> accessible
> >> when there is no CFG.  I guess the issue is that we fold this during
> >> gimplification
> >> where the next stmt is not yet "there" (but still in GENERIC)?
> >>
> >> We generally do not want to have unfolded stmts in the IL when we can
> >> avoid that
> >> which is why we fold most stmts during gimplification.  We also do
> >> that because
> >> we now do less folding on GENERIC.
> >>
> >> There may be the possibility to refactor gimplification time folding
> >> to what we
> >> do during inlining - queue stmts we want to fold and perform all
> >> folding delayed.
> >> This of course means bigger compile-time due to cache effects.
> >>
> >>>
> >>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
> >>>> index d0792aa..f1988f6 100644
> >>>> --- a/gcc/tree-ssa-strlen.c
> >>>> +++ b/gcc/tree-ssa-strlen.c
> >>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
> >>>> (gimple_stmt_iterator gsi, tree src, tree cnt)
> >>>>         && known_eq (dstoff, lhsoff)
> >>>>         && operand_equal_p (dstbase, lhsbase, 0))
> >>>>       return false;
> >>>> +
> >>>> +      if (code == MEM_REF
> >>>> +       && TREE_CODE (lhsbase) == SSA_NAME
> >>>> +       && known_eq (dstoff, lhsoff))
> >>>> +     {
> >>>> +       /* Extract the referenced variable from something like
> >>>> +            MEM[(char *)d_3(D) + 3B] = 0;  */
> >>>> +       gimple *def = SSA_NAME_DEF_STMT (lhsbase);
> >>>> +       if (gimple_nop_p (def))
> >>>> +         {
> >>>> +           lhsbase = SSA_NAME_VAR (lhsbase);
> >>>> +           if (lhsbase
> >>>> +               && dstbase
> >>>> +               && operand_equal_p (dstbase, lhsbase, 0))
> >>>> +             return false;
> >>>> +         }
> >>>> +     }
> >>> If you find yourself looking at SSA_NAME_VAR, you're usually barking up
> >>> the wrong tree.  It'd be easier to suggest something here if I could see
> >>> the gimple (with virtual operands).  BUt at some level what you really
> >>> want to do is make sure the base of the MEM_REF is the same as what got
> >>> passed as the destination of the strncpy.  You'd want to be testing
> >>> SSA_NAMEs in that case.
> >>
> >> Yes.  Why not simply compare the SSA names?  Why would it be
> >> not OK to do that when !lhsbase?
> >
> > The added code handles this case:
> >
> >   void f (char *d)
> >   {
> >     __builtin_strncpy (d, "12345", 4);
> >     d[3] = 0;
> >   }
> >
> > where during forwprop we see:
> >
> >   __builtin_strncpy (d_3(D), "12345", 4);
> >   MEM[(char *)d_3(D) + 3B] = 0;
> >
> > The next statement after the strncpy is the assignment whose
> > lhs is the MEM_REF with a GIMPLE_NOP as an operand.  There
> > is no other information in the GIMPLE_NOP that I can see to
> > tell that the operand is d_3(D) or that it's the same as
> > the strncpy argument (i.e., the PARAM_DECl d).  Having to
> > do open-code this all the time seems so cumbersome -- is
> > there some API that would do this for me?  (I thought
> > get_addr_base_and_unit_offset was that API but clearly in
> > this case it doesn't do what I expect -- it just returns
> > the argument.)
>
> I think you need to look harder at that MEM_REF.  It references d_3.
> That's what you need to be checking.  The base (d_3) is the first
> operand of the MEM_REF, the offset is the second operand of the MEM_REF.
>
> (gdb) p debug_gimple_stmt ($2)
> # .MEM_5 = VDEF <.MEM_4>
> MEM[(char *)d_3(D) + 3B] = 0;
>
>
> (gdb) p gimple_assign_lhs ($2)
> $5 = (tree_node *) 0x7ffff01a6208
>
> (gdb) p debug_tree ($5)
>  <mem_ref 0x7ffff01a6208
>     type <integer_type 0x7ffff00723f0 char public string-flag QI
>         size <integer_cst 0x7ffff0059d80 constant 8>
>         unit-size <integer_cst 0x7ffff0059d98 constant 1>
>         align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x7ffff00723f0 precision:8 min <integer_cst 0x7ffff0059dc8 -128> max
> <integer_cst 0x7ffff0059df8 127>
>         pointer_to_this <pointer_type 0x7ffff007de70>>
>
>     arg:0 <ssa_name 0x7ffff0063dc8
>         type <pointer_type 0x7ffff007de70 type <integer_type
> 0x7ffff00723f0 char>
>             public unsigned DI
>             size <integer_cst 0x7ffff0059c90 constant 64>
>             unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>             align:64 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7ffff007de70 reference_to_this <reference_type
> 0x7ffff017d738>>
>         visited var <parm_decl 0x7ffff01a5000 d>
>         def_stmt GIMPLE_NOP
>         version:3>
>     arg:1 <integer_cst 0x7ffff018ae40 type <pointer_type 0x7ffff007de70>
> constant 3>
>     j.c:4:6 start: j.c:4:5 finish: j.c:4:8>
>
>
> Note arg:0 is the SSA_NAME d_3.  And not surprising that's lhsbase:
>
> (gdb) p debug_tree (lhsbase)
> <ssa_name 0x7ffff0063dc8
>     type <pointer_type 0x7ffff007de70
>         type <integer_type 0x7ffff00723f0 char public string-flag QI
>             size <integer_cst 0x7ffff0059d80 constant 8>
>             unit-size <integer_cst 0x7ffff0059d98 constant 1>
>             align:8 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7ffff00723f0 precision:8 min <integer_cst
> 0x7ffff0059dc8 -128> max <integer_cst 0x7ffff0059df8 127>
>             pointer_to_this <pointer_type 0x7ffff007de70>>
>         public unsigned DI
>         size <integer_cst 0x7ffff0059c90 constant 64>
>         unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>         align:64 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7ffff007de70 reference_to_this <reference_type
> 0x7ffff017d738>>
>     visited var <parm_decl 0x7ffff01a5000 d>
>     def_stmt GIMPLE_NOP
>     version:3>
>
>
> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
> debug get_addr_base_and_unit_offset to understand what's going on.
> Essentially you're getting different results of
> get_addr_base_and_unit_offset in a case where they arguably should be
> the same.

Probably get_attr_nonstring_decl has the same "mistake" and returns
the PARM_DECL instead of the SSA name pointer.  So we're comparing
apples and oranges here.

Yeah:

/* If EXPR refers to a character array or pointer declared attribute
   nonstring return a decl for that array or pointer and set *REF to
   the referenced enclosing object or pointer.  Otherwise returns
   null.  */

tree
get_attr_nonstring_decl (tree expr, tree *ref)
{
  tree decl = expr;
  if (TREE_CODE (decl) == SSA_NAME)
    {
      gimple *def = SSA_NAME_DEF_STMT (decl);

      if (is_gimple_assign (def))
        {
          tree_code code = gimple_assign_rhs_code (def);
          if (code == ADDR_EXPR
              || code == COMPONENT_REF
              || code == VAR_DECL)
            decl = gimple_assign_rhs1 (def);
        }
      else if (tree var = SSA_NAME_VAR (decl))
        decl = var;
    }

  if (TREE_CODE (decl) == ADDR_EXPR)
    decl = TREE_OPERAND (decl, 0);

  if (ref)
    *ref = decl;

I see a lot of "magic" here again in the attempt to "propagate"
a nonstring attribute.  Note

foo (char *p __attribute__(("nonstring")))
{
  p = "bar";
  strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
}

is perfectly valid and p as passed to strlen is _not_ nonstring(?).

I think in your code comparing bases you want to look at the _original_
argument to the string function rather than what get_attr_nonstring_decl
returned as ref.

Richard.

> Jeff
>
> Jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]