This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix PR66313


On Tue, 13 Jun 2017, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Tue, 13 Jun 2017, Richard Sandiford wrote:
> >> Richard Biener <rguenther@suse.de> writes:
> >> > So I've come back to PR66313 and found a solution to the tailrecursion
> >> > missed optimization when fixing the factoring folding to use an unsigned
> >> > type when we're not sure of overflow.
> >> >
> >> > The folding part is identical to my last try from 2015, the tailrecursion
> >> > part makes us handle intermittent stmts that were introduced by foldings
> >> > that "clobber" our quest walking the single-use chain of stmts between
> >> > the call and the return (and failing at all stmts that are not part
> >> > of said chain).  A simple solution is to move the stmts that are not
> >> > part of the chain and that we can move before the call.  That handles
> >> > the leaf conversions that now appear for tree-ssa/tailrecursion-6.c
> >> >
> >> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >> >
> >> > Richard.
> >> >
> >> > 2017-05-31  Richard Biener  <rguenther@suse.de>
> >> >
> >> > 	PR middle-end/66313
> >> > 	* fold-const.c (fold_plusminus_mult_expr): If the factored
> >> > 	factor may be zero use a wrapping type for the inner operation.
> >> > 	* tree-tailcall.c (independent_of_stmt_p): Pass in to_move bitmap
> >> > 	and handle moved defs.
> >> > 	(process_assignment): Properly guard the unary op case.  Return a
> >> > 	tri-state indicating that moving the stmt before the call may allow
> >> > 	to continue.  Pass through to_move.
> >> > 	(find_tail_calls): Handle moving unrelated defs before
> >> > 	the call.
> >> >
> >> > 	* c-c++-common/ubsan/pr66313.c: New testcase.
> >> > 	* gcc.dg/tree-ssa/loop-15.c: Adjust.
> >> >
> >> > Index: gcc/fold-const.c
> >> > ===================================================================
> >> > *** gcc/fold-const.c.orig	2015-10-29 12:32:33.302782318 +0100
> >> > --- gcc/fold-const.c	2015-10-29 14:08:39.936497739 +0100
> >> > *************** fold_plusminus_mult_expr (location_t loc
> >> > *** 6916,6925 ****
> >> >       }
> >> >     same = NULL_TREE;
> >> >   
> >> > !   if (operand_equal_p (arg01, arg11, 0))
> >> > !     same = arg01, alt0 = arg00, alt1 = arg10;
> >> > !   else if (operand_equal_p (arg00, arg10, 0))
> >> >       same = arg00, alt0 = arg01, alt1 = arg11;
> >> >     else if (operand_equal_p (arg00, arg11, 0))
> >> >       same = arg00, alt0 = arg01, alt1 = arg10;
> >> >     else if (operand_equal_p (arg01, arg10, 0))
> >> > --- 6916,6926 ----
> >> >       }
> >> >     same = NULL_TREE;
> >> >   
> >> > !   /* Prefer factoring a common non-constant.  */
> >> > !   if (operand_equal_p (arg00, arg10, 0))
> >> >       same = arg00, alt0 = arg01, alt1 = arg11;
> >> > +   else if (operand_equal_p (arg01, arg11, 0))
> >> > +     same = arg01, alt0 = arg00, alt1 = arg10;
> >> >     else if (operand_equal_p (arg00, arg11, 0))
> >> >       same = arg00, alt0 = arg01, alt1 = arg10;
> >> >     else if (operand_equal_p (arg01, arg10, 0))
> >> > *************** fold_plusminus_mult_expr (location_t loc
> >> > *** 6974,6987 ****
> >> >   	}
> >> >       }
> >> >   
> >> > !   if (same)
> >> >       return fold_build2_loc (loc, MULT_EXPR, type,
> >> >   			fold_build2_loc (loc, code, type,
> >> >   				     fold_convert_loc (loc, type, alt0),
> >> >   				     fold_convert_loc (loc, type, alt1)),
> >> >   			fold_convert_loc (loc, type, same));
> >> >   
> >> > !   return NULL_TREE;
> >> >   }
> >> >   
> >> >   /* Subroutine of native_encode_expr.  Encode the INTEGER_CST
> >> > --- 6975,7010 ----
> >> >   	}
> >> >       }
> >> >   
> >> > !   if (!same)
> >> > !     return NULL_TREE;
> >> > ! 
> >> > !   if (! INTEGRAL_TYPE_P (type)
> >> > !       || TYPE_OVERFLOW_WRAPS (type)
> >> > !       /* We are neither factoring zero nor minus one.  */
> >> > !       || TREE_CODE (same) == INTEGER_CST)
> >> >       return fold_build2_loc (loc, MULT_EXPR, type,
> >> >   			fold_build2_loc (loc, code, type,
> >> >   				     fold_convert_loc (loc, type, alt0),
> >> >   				     fold_convert_loc (loc, type, alt1)),
> >> >   			fold_convert_loc (loc, type, same));
> >> >   
> >> > !   /* Same may be zero and thus the operation 'code' may overflow.  Likewise
> >> > !      same may be minus one and thus the multiplication may overflow.  Perform
> >> > !      the operations in an unsigned type.  */
> >> > !   tree utype = unsigned_type_for (type);
> >> > !   tree tem = fold_build2_loc (loc, code, utype,
> >> > ! 			      fold_convert_loc (loc, utype, alt0),
> >> > ! 			      fold_convert_loc (loc, utype, alt1));
> >> > !   /* If the sum evaluated to a constant that is not -INF the multiplication
> >> > !      cannot overflow.  */
> >> > !   if (TREE_CODE (tem) == INTEGER_CST
> >> > !       && ! wi::eq_p (tem, wi::min_value (TYPE_PRECISION (utype), SIGNED)))
> >> > !     return fold_build2_loc (loc, MULT_EXPR, type,
> >> > ! 			    fold_convert (type, tem), same);
> >> > ! 
> >> > !   return fold_convert_loc (loc, type,
> >> > ! 			   fold_build2_loc (loc, MULT_EXPR, utype, tem,
> >> > ! 					    fold_convert_loc (loc, utype, same)));
> >> >   }
> >> >   
> >> >   /* Subroutine of native_encode_expr.  Encode the INTEGER_CST
> >> 
> >> Sorry if you already know, but this part means that we can no longer
> >> vectorise:
> >> 
> >> int
> >> f (int *x, int b1, int b2, int b3)
> >> {
> >>   int foo = 0;
> >>   for (int i1 = 0; i1 < b1; ++i1)
> >>     for (int i2 = 0; i2 < b2; ++i2)
> >>       for (int i3 = 0; i3 < b3; ++i3)
> >>         foo += x[i1 * b2 * b3 + i2 * b3 + (i3 - 1)];
> >>   return foo;
> >> }
> >> 
> >> We now convert all the arithmetic in the [...] to unsigned int
> >> and reassociate it so that the "- 1" is applied last.  We then assume
> >> that the overflow in this subtraction is well-defined and that the
> >> &x[...] might not be linear for the inner loop.
> >
> > Can you file a PR?
> 
> OK, filed as PR81082.
> 
> >   # i3_44 = PHI <i3_32(6), 0(4)>
> >   i3.2_7 = (unsigned int) i3_44;
> >   _9 = i3.2_7 + _34;
> >   _10 = (int) _9;
> >   _11 = (long unsigned int) _10;
> >   _12 = _11 * 4;
> >   _13 = x_30(D) + _12;
> >   _14 = *_13;
> > ...
> >    i3_32 = i3_44 + 1;
> >
> > so we have *(x_30(D) + 4 * ((sizetype)(int)((unsigned){ 0, + , 1} + X))
> >
> > It might mean that we need to avoid this re-association.  Not sure how
> > to detect worthwhile vs. not worthwhile cases during folding.  At least
> > I see no easy way to recover from it in SCEV analysis unless the
> > number of iterations is constrained more than in your example.
> 
> Yeah, in the example that this was reduced from, not reassociating
> is far more preferable to changing the types.  But like you say,
> I've no idea how we'd know that at this stage.

In the past I played with not obfuscating things during FE lowering
so early, namely not expose the * 4 but keep the original types of
array / pointer indexes.  On the original mem-ref branch I had
a IDX_EXPR that allowed to chain those (a widen-multiply-accumulate
op).

That said, the context where this association is not interesting is
address context and the multiplication to the byte offset.

Of course in your case there's also b3 factored out which looks
like a generally interesting transform (but eventually harmful in address
context again).

>From the FE we get

*(x + (sizetype) ((long unsigned int) (((i1 * b2) * b3 + i2 * b3) + (i3 + 
-1)) * 4))

and SCEV uses the fact that the mults/adds have undefined behavior
on overflow.  The same missed optimization occurs if you make all those
vars unsigned (with or without the folding in question):

#define U unsigned
int
f (int *x, U int b1, U int b2, U int b3)
{
  int foo = 0;
  for (U int i1 = 0; i1 < b1; ++i1)
    for (U int i2 = 0; i2 < b2; ++i2)
      for (U int i3 = 0; i3 < b3; ++i3)
        foo += x[i1 * b2 * b3 + i2 * b3 + (i3 - 1)];
  return foo;
}

It works again for unsigned long as there can be no valid object
where the computation could wrap around.

Richard.

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]