Bug 84859 - [8 Regression] bogus -Warray-bounds on a memcpy in a loop
Summary: [8 Regression] bogus -Warray-bounds on a memcpy in a loop
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 8.0
: P3 normal
Target Milestone: 8.0
Assignee: Richard Biener
URL:
Keywords: diagnostic, missed-optimization
Depends on: 33315
Blocks: Warray-bounds
  Show dependency treegraph
 
Reported: 2018-03-14 01:50 UTC by Martin Sebor
Modified: 2020-05-21 19:26 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work: 7.3.1
Known to fail:
Last reconfirmed: 2018-03-14 00:00:00


Attachments
patch (1.51 KB, patch)
2018-03-16 09:36 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Sebor 2018-03-14 01:50:14 UTC
The following test case was reduced from the report at https://lkml.org/lkml/2018/3/13/383.

In g() GCC correctly figures out the memcpy call is safe.  In h(), though, it thinks it overflows.

$ cat x.c && gcc -S -O2 -Wall x.c
void f (void*);

void __attribute__ ((noinline, noclone))
g (const void *p, unsigned n)
{
  unsigned char a[8];
  if (n > sizeof a)
    return;

  for (; n > 0; n -= *a)
  {
     *a = n > 255 ? 255 : n;

    __builtin_memcpy (a, p, *a);   // no warning (good)
  }
}

void __attribute__ ((noinline, noclone))
h (const void *p, unsigned n)
{
  unsigned char a[8];
  if (n > sizeof a)
    return;

  for (; n > 0; n -= *a)
  {
    if (n > 255)
      *a = 255;
    else
      *a = n;

    __builtin_memcpy (a, p, *a);   // bogus -Warray-bounds
  }
}
x.c: In function ‘h’:
x.c:32:5: warning: ‘__builtin_memcpy’ forming offset [9, 255] is out of the bounds [0, 8] of object ‘a’ with type ‘unsigned char[8]’ [-Warray-bounds]
     __builtin_memcpy (a, p, *a);   // bogus -Warray-bounds
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~
x.c:21:17: note: ‘a’ declared here
   unsigned char a[8];
                 ^
Comment 1 Richard Biener 2018-03-14 13:34:25 UTC
The reason is a missed phiopt to MIN_EXPR for h and VRP not propagating through
memory.

we have

  <bb 4> [local count: 955630223]:
  if (n_6 > 255)
    goto <bb 5>; [50.00%]
  else
    goto <bb 6>; [50.00%]

  <bb 5> [local count: 477815112]:
  a[0] = 255;
  goto <bb 7>; [100.00%]

  <bb 6> [local count: 477815112]:
  _1 = (unsigned char) n_6;
  a[0] = _1;

  <bb 7> [local count: 955630223]:
  _2 = a[0];

where we lack a pass replacing that with

   <bb 7>
   # tem = PHI <255(5), _1(6)>
   a[0] = tem;
   _2 = a[0];

and then a MIN_EXPR.

Not sure why thw warn_restrict pass ends up with this kind of value-range
here though.  In the IL we have threaded the n > 255 check though and
ended up with an explicit memcpy (..., 255);  but that's not [9, 255].
Comment 2 Jakub Jelinek 2018-03-14 17:02:28 UTC
That is just something gimple-ssa-warn-restrict.c mades up from the call with 255 constant size.  Figuring out from the warning wording that it complains about a call with constant length 255 is impossible.
Comment 3 Martin Sebor 2018-03-14 17:15:54 UTC
The warning is in response to the call to memcpy(d, s, 255) in h():

  <bb 8> [local count: 477815112]:
  a[0] = 255;
  __builtin_memcpy (&a, p_15(D), 255);
  _26 = a[0];

The wording seems clear:  The function is forming an offset that is out of the bounds of the referenced object.

The range of the out-of-bounds offset is [9, 255], while the bounds of the object are [0, 8].

If you think you have of a better/clearer way to phrase it then please propose it.
Comment 4 Jakub Jelinek 2018-03-14 17:16:21 UTC
Note, even teaching VRP to look through stores and loads from memory wouldn't help here, not even for the first function, because it uses *a as the length, but also overwrites it (potentially or for real) with the memcpy, so the compiler can't really know anything about *a in the second and following iterations, except that it is unsigned char and thus [0, 255].  n can wrap around.
Comment 5 Jakub Jelinek 2018-03-14 17:36:25 UTC
Seems the kernel testcase is actually more like:
void f (void*);

void __attribute__ ((noinline, noclone))
g (const void *p, unsigned n)
{
  unsigned char a[8];
  if (n > sizeof a - 1)
    return;

  for (; n > 0; n -= *a)
  {
     *a = n > 255 ? 255 : n;

    __builtin_memcpy (a + 1, p, *a);   // no warning (good)
    f (a);
  }
}

void __attribute__ ((noinline, noclone))
h (const void *p, unsigned n)
{
  unsigned char a[8];
  if (n > sizeof a - 1)
    return;

  for (; n > 0; n -= *a)
  {
    if (n > 255)
      *a = 255;
    else
      *a = n;

    __builtin_memcpy (a + 1, p, *a);   // bogus -Warray-bounds
    f (a);
  }
}

where the memcpy calls don't clobber *a, but there is another call that makes a escape and could modify the value, so I'm afraid there is absolutely nothing VRP can do here and the only fix is not to warn on statements affected by jump threading.

What the kernel should have used (if it couldn't use get rid of the loop altogether like it did) would be something like:
void f (void*);

void __attribute__ ((noinline, noclone))
g (const void *p, unsigned n)
{
  unsigned char a[8], t;
  if (n > sizeof a - 1)
    return;

  for (; n > 0; n -= t)
  {
     t = n > 255 ? 255 : n;
     *a = t;

    __builtin_memcpy (a + 1, p, t);   // no warning (good)
    f (a);
  }
}

void __attribute__ ((noinline, noclone))
h (const void *p, unsigned n)
{
  unsigned char a[8], t;
  if (n > sizeof a - 1)
    return;

  for (; n > 0; n -= t)
  {
    if (n > 255)
      t = 255;
    else
      t = n;
    *a = t;

    __builtin_memcpy (a + 1, p, t);   // no warning (good)
    f (a);
  }
}

i.e. introduce a temporary for the size in the current iteration and use that across the calls that might have but don't really change that.
Comment 6 Martin Sebor 2018-03-14 18:00:19 UTC
We have seen this sort of thing come up a number of times in the past.  Jeff and I have discussed changing jump threading to either avoid introducing paths involving invalid calls, or inserting __builtin_unreachable/__builtin_trap.  That could not only avoid false positives but also make it possible to emit more efficient code.
Comment 7 Richard Biener 2018-03-15 13:59:07 UTC
The patch for PR33315 manages to transform

  if (n_21 > 255)
    goto <bb 6>; [50.00%]
  else
    goto <bb 7>; [50.00%]

  <bb 6> [local count: 477815112]:
  a[0] = 255;
  goto <bb 8>; [100.00%]

  <bb 7> [local count: 477815112]:
  _1 = (unsigned char) n_21;
  a[0] = _1;

  <bb 8> [local count: 955630225]:
  _2 = a[0];

to

  if (n_21 > 255)
    goto <bb 7>; [50.00%]
  else
    goto <bb 6>; [50.00%]

  <bb 6> [local count: 477815112]:
  _1 = (unsigned char) n_21;

  <bb 7> [local count: 955630225]:
  # _23 = PHI <255(5), _1(6)>
  a[0] = _23;
  _2 = a[0];

but as proposed that would happen too late (we've already threaded the jump
and PRE already would have done sth similar via the load from a[0]).  As
given it looks like sinking should be performed before the first phiopt pass
(which also runs quite late - there was talk of an early phiopt pass).
Possibly cselim could be enhanced to catch the above as well.  In fact its
comments say it would handle the above.  Ah, we do

  /* Set PARAM_MAX_STORES_TO_SINK to 0 if either vectorization or if-conversion
     is disabled.  */
  if ((!opts->x_flag_tree_loop_vectorize && !opts->x_flag_tree_slp_vectorize)
       || !opts->x_flag_tree_loop_if_convert)
    maybe_set_param_value (PARAM_MAX_STORES_TO_SINK, 0,
                           opts->x_param_values, opts_set->x_param_values);

for whatever reason :/  --param max-stores-to-sink=2 fixes the testcase.
Comment 8 Richard Biener 2018-03-15 15:11:00 UTC
Mine.  We can handle the case of a single store in the BBs more generally and bypass the limit.  The limit was added in r171381 where it was imposed on
all transforms when the function was changed to handle the case of more
than one stmt (store) in each BB.  Previously the transform for the single
stores case was always performed.

Thus, for example like the following.  Could be simplified somewhat iff
the stores need to be the last stmt in the BBs (like it was before said
re-org).  last_and_only_stmt isn't enough here as we have a feeding
scalar stmt around in one BB.

Opinions?

Index: gcc/tree-ssa-phiopt.c
===================================================================
--- gcc/tree-ssa-phiopt.c       (revision 258552)
+++ gcc/tree-ssa-phiopt.c       (working copy)
@@ -2061,8 +2061,6 @@ static bool
 cond_if_else_store_replacement (basic_block then_bb, basic_block else_bb,
                                 basic_block join_bb)
 {
-  gimple *then_assign = last_and_only_stmt (then_bb);
-  gimple *else_assign = last_and_only_stmt (else_bb);
   vec<data_reference_p> then_datarefs, else_datarefs;
   vec<ddr_p> then_ddrs, else_ddrs;
   gimple *then_store, *else_store;
@@ -2073,14 +2071,49 @@ cond_if_else_store_replacement (basic_bl
   tree then_lhs, else_lhs;
   basic_block blocks[3];
 
-  if (MAX_STORES_TO_SINK == 0)
+  /* Handle the case with single store in THEN_BB and ELSE_BB.  That is
+     cheap enough to always handle.  */
+  gphi *vphi = NULL;
+  for (gphi_iterator si = gsi_start_phis (join_bb); !gsi_end_p (si);
+       gsi_next (&si))
+    if (virtual_operand_p (gimple_phi_result (si.phi ())))
+      {
+       vphi = si.phi ();
+       break;
+      }
+  if (!vphi)
     return false;
+  gimple *then_assign = SSA_NAME_DEF_STMT (PHI_ARG_DEF_FROM_EDGE
+                                           (vphi, single_succ_edge (then_bb)));
+  gimple *else_assign = SSA_NAME_DEF_STMT (PHI_ARG_DEF_FROM_EDGE
+                                           (vphi, single_succ_edge (else_bb)));
+
+  /* Verify there are no other stores or loads in the BBs.  That allows us
+     to elide dependence checking.  */
+  use_operand_p use_p;
+  imm_use_iterator imm_iter;
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_vuse (then_assign))
+    if (USE_STMT (use_p) != then_assign
+       && gimple_bb (USE_STMT (use_p)) == then_bb)
+      {
+       then_assign = NULL;
+       break;
+      }
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_vuse (else_assign))
+    if (USE_STMT (use_p) != else_assign
+       && gimple_bb (USE_STMT (use_p)) == else_bb)
+      {
+       else_assign = NULL;
+       break;
+      }
 
-  /* Handle the case with single statement in THEN_BB and ELSE_BB.  */
   if (then_assign && else_assign)
     return cond_if_else_store_replacement_1 (then_bb, else_bb, join_bb,
                                              then_assign, else_assign);
 
+  if (MAX_STORES_TO_SINK == 0)
+    return false;
+
   /* Find data references.  */
   then_datarefs.create (1);
   else_datarefs.create (1);
Comment 9 Richard Biener 2018-03-15 15:19:35 UTC
Like moving over a const call after the stores might cause us to spill across the call.  Moving over any stmt could enlarge lifetimes enough to do that.  Register
lifetime could be so that we cannot coalesce the copies in the PHI (but that
probably is offset by removing one of the stores -- but maybe the fast path
is slowed down then).

value_replacement has /* Allow up to 2 cheap preparation statements that prepare argument ... */ for example.  But I guess preparation stmts are never of a concern?
Comment 10 Richard Biener 2018-03-16 09:36:28 UTC
Created attachment 43674 [details]
patch

Final patch I am testing.
Comment 11 Richard Biener 2018-03-19 14:09:05 UTC
Fixed.
Comment 12 Richard Biener 2018-03-19 14:09:30 UTC
Author: rguenth
Date: Mon Mar 19 14:08:58 2018
New Revision: 258645

URL: https://gcc.gnu.org/viewcvs?rev=258645&root=gcc&view=rev
Log:
2018-03-19  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/84859
	* tree-ssa-phiopt.c (single_trailing_store_in_bb): New function.
	(cond_if_else_store_replacement): Perform sinking operation on
	single-store BBs regardless of MAX_STORES_TO_SINK setting.
	Generalize what a BB with a single eligible store is.

	* gcc.dg/tree-ssa/pr84859.c: New testcase.
	* gcc.dg/tree-ssa/pr35286.c: Disable cselim.
	* gcc.dg/tree-ssa/split-path-6.c: Likewise.
	* gcc.dg/tree-ssa/split-path-7.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr84859.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr35286.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/split-path-6.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/split-path-7.c
    trunk/gcc/tree-ssa-phiopt.c