[PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Thu Oct 4 15:51:00 GMT 2018

On 8/27/18 9:42 AM, Richard Biener wrote:
> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <law@redhat.com> wrote:
>>
>> On 08/27/2018 02:29 AM, Richard Biener wrote:
>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>>>
>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>>> The warning suppression for -Wstringop-truncation looks for
>>>>> the next statement after a truncating strncpy to see if it
>>>>> adds a terminating nul.  This only works when the next
>>>>> statement can be reached using the Gimple statement iterator
>>>>> which isn't until after gimplification.  As a result, strncpy
>>>>> calls that truncate their constant argument that are being
>>>>> folded to memcpy this early get diagnosed even if they are
>>>>> followed by the nul assignment:
>>>>>
>>>>>   const char s[] = "12345";
>>>>>   char d[3];
>>>>>
>>>>>   void f (void)
>>>>>   {
>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>>     d[sizeof d - 1] = 0;
>>>>>   }
>>>>>
>>>>> To avoid the warning I propose to defer folding strncpy to
>>>>> memcpy until the pointer to the basic block the strnpy call
>>>>> is in can be used to try to reach the next statement (this
>>>>> happens as early as ccp1).  I'm aware of the preference to
>>>>> fold things early but in the case of strncpy (a relatively
>>>>> rarely used function that is often misused), getting
>>>>> the warning right while folding a bit later but still fairly
>>>>> early on seems like a reasonable compromise.  I fear that
>>>>> otherwise, the false positives will drive users to adopt
>>>>> other unsafe solutions (like memcpy) where these kinds of
>>>>> bugs cannot be as readily detected.
>>>>>
>>>>> Tested on x86_64-linux.
>>>>>
>>>>> Martin
>>>>>
>>>>> PS There still are outstanding cases where the warning can
>>>>> be avoided.  I xfailed them in the test for now but will
>>>>> still try to get them to work for GCC 9.
>>>>>
>>>>> gcc-87028.diff
>>>>>
>>>>>
>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
>>>>> gcc/ChangeLog:
>>>>>
>>>>>       PR tree-optimization/87028
>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
>>>>>       statement doesn't belong to a basic block.
>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
>>>>>       the left hand side of assignment.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>       PR tree-optimization/87028
>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>>
>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>>> index 07341eb..284c2fb 100644
>>>>> --- a/gcc/gimple-fold.c
>>>>> +++ b/gcc/gimple-fold.c
>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
>>>>>    if (tree_int_cst_lt (ssize, len))
>>>>>      return false;
>>>>>
>>>>> +  /* Defer warning (and folding) until the next statement in the basic
>>>>> +     block is reachable.  */
>>>>> +  if (!gimple_bb (stmt))
>>>>> +    return false;
>>>> I think you want cfun->cfg as the test here.  They should be equivalent
>>>> in practice.
>>>
>>> Please do not add 'cfun' references.  Note that the next stmt is also accessible
>>> when there is no CFG.  I guess the issue is that we fold this during
>>> gimplification where the next stmt is not yet "there" (but still in GENERIC)?
>> That was my assumption.  I almost suggested peeking at gsi_next and
>> avoiding in that case.
> 
> So I'd rather add guards to maybe_fold_stmt in the gimplifier then.
So I think the concern with adding the guards to maybe_fold_stmt is the
possibility of further fallout.

I guess they could be written to target this case specifically to
minimize fallout, but that feels like we're doing the same thing
(band-aid) just in a different place.

> 
>>>
>>> We generally do not want to have unfolded stmts in the IL when we can avoid that
>>> which is why we fold most stmts during gimplification.  We also do that because
>>> we now do less folding on GENERIC.
>> But an unfolded call in the IL should always be safe and we've got
>> plenty of opportunities to fold it later.
> 
> Well - we do.  The very first one is forwprop though which means we'll miss to
> re-write some memcpy parts into SSA:
> 
>           NEXT_PASS (pass_ccp, false /* nonzero_p */);
>           /* After CCP we rewrite no longer addressed locals into SSA
>              form if possible.  */
>           NEXT_PASS (pass_forwprop);
> 
> likewise early object-size will be confused by memcpy calls that just exist
> to avoid TBAA issues (another of our recommendations besides using unions).
> 
> We do fold mem* early for a reason ;)
> 
> "We can always do warnings earlier" would be a similar true sentence.
I'm not disagreeing at all.  There's a natural tension between the
benefits of folding early to enable more optimizations downstream and
leaving the IL in a state where we can give actionable warnings.

Similarly there's a natural tension between warning early vs warning
late.  Code that triggers the warning may ultimately be proved
unreachable, or we may discover simplifications that either suppress or
expose a warning.

There is no easy answer here.  But I think we can legitimately ask
questions.  ie, does folding strnlen here really improve things
downstream in ways that are measurable?  Does the false positive really
impact the utility of the warning?  etc.

I'd hazard a guess that Martin is particularly sensitive to false
positives based on feedback he's received from our developer community
as well as downstream consumers of his work.

> 
> Both come at a cost.  You know I'm usually declaring GCC to be an
> optimizing compiler
> and not a static analysis engine ;)  So I'm not too much convinced when seeing
> disabling/delaying folding here and there to catch some false
> negatives for -Wxyz.
> 
> We need to work out a plan rather than throwing sticks here and there.
I still lean towards the optimization side in general, but I'm also a
believer that the right place to issue warnings is in the tool that gets
used every day.  Deferring it to some other tool that runs at a later
time which all developers may not have access to and not all projects
use means the static analysis side is orders of magnitude less useful
than it should be.

I think the long term plan in my mind would be to not fold during
gimplification.  Immediately after gimplification we'd do warning
analysis, then we'd commence with folding everything before starting the
gimple optimization pipeline.

With the natural tensions around warning early vs warning late the
warning analysis phase may choose to mark statements in various ways
rather than issuing the warning at that time.  ie, it might choose to
mark the statement with the set of potential issues as well as marking
those which were proven safe.  A later pass could then refine things and
issue the actual warning.  Or something like that.

You'll probably note this mirrors the design I wanted to do for
Wuninitialized to improve its stability while at the same time allowing
us the option of trying to minimize false positives.

But all that seems a bit pie in the sky right now.  I think the question
we should answer is do we tackle 87028 now or defer it to a later date
when we've fleshed this out further?

Jeff