This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Use get_nonzero_bits to improve vectorization


On Tue, Oct 29, 2013 at 01:11:53PM +0100, Richard Biener wrote:
> > +/* Return number of known trailing zero bits in EXPR, or, if the value of
> > +   EXPR is known to be zero, the precision of it's type.  */
> > +
> > +int
> 
> unsigned int?

Ok.

> > +    case PLUS_EXPR:
> > +    case MINUS_EXPR:
> > +    case BIT_IOR_EXPR:
> > +    case BIT_XOR_EXPR:
> > +    case MIN_EXPR:
> > +    case MAX_EXPR:
> > +      ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > +      ret2 = tree_ctz (TREE_OPERAND (expr, 1));
> 
> This first recurses but if either one returns 0 you don't have
> to (that cuts down the recursion from exponential to linear in
> the common case?).  Thus, early out on ret == 0?

Yeah, that is reasonable.  Usually we use this during expansion etc.,
trees won't be arbitrarily large and for SSA_NAMEs we don't recurse
on definitions, so I think we are unlikely to ever see very large
expressions there though.  I've added similar early out to the COND_EXPR
case.

> > +      return MIN (ret1, ret2);
> > +    case POINTER_PLUS_EXPR:
> > +      ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > +      ret2 = tree_ctz (TREE_OPERAND (expr, 1));
> > +      ret2 = MIN (ret2, prec);
> 
> Why do you need that here but not elsewhere when processing
> binary ops?

Because POINTER_PLUS_EXPR (well, also shifts, but for those we
don't call tree_ctz on the second argument) is the only
binop we handle that uses different types for the operands,
for the first argument we know it has the same precision as the result, but
what if sizetype has bigger precision than TYPE_PRECISION of pointers?
I know it typically doesn't, just wanted to make sure we never return
an out of range return value, tree_ctz result should be in [0, prec]
always.

> > +      return MIN (ret1, ret2);
> > +    case BIT_AND_EXPR:
> > +      ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > +      ret2 = tree_ctz (TREE_OPERAND (expr, 1));
> > +      return MAX (ret1, ret2);
> > +    case MULT_EXPR:
> > +      ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > +      ret2 = tree_ctz (TREE_OPERAND (expr, 1));
> > +      return MIN (ret1 + ret2, prec);
> > +    case LSHIFT_EXPR:
> > +      ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > +      if (host_integerp (TREE_OPERAND (expr, 1), 1)
> 
> check that first before recursing for op0 - if op1 is negative
> you simply return ret1 which looks wrong, too.

If op1 is negative, then it is undefined behavior, so assuming
in that case the same thing as when op1 is not constant (i.e.
that we worst case shift left by 0 and thus not increase number
of trailing zeros, or shift left by > 0 and increase it) is IMHO
correct.  I don't think we treat left shift by negative value as
right shift anywhere, do we?

> > +	  && ((unsigned HOST_WIDE_INT) tree_low_cst (TREE_OPERAND (expr, 1), 1)
> > +	      < (unsigned HOST_WIDE_INT) prec))
> 
> This check is to avoid overflowing ret1 + ret2?  If so, why not add
> 
> > +	{
> > +	  ret2 = tree_low_cst (TREE_OPERAND (expr, 1), 1);
> 
>   ret2 = MIN (ret2, prec);
> 
> instead?

Because ret{1,2} are int (well, with the change you've suggested unsigned
int), but tree_low_cst is UHWI, so if the shift count is say UINT_MAX + 1
(yeah, surely undefined behavior), I just didn't want to lose any of the upper
bits.  Sure, alternatively I could have an UHWI temporary variable and
assign tree_low_cst to it and do the MIN on that temporary and prec, but
still I think it is better to handle out of range constants as variable
shift count rather than say shift count up by prec - 1.

> > +    case TRUNC_DIV_EXPR:
> > +    case CEIL_DIV_EXPR:
> > +    case FLOOR_DIV_EXPR:
> > +    case ROUND_DIV_EXPR:
> > +    case EXACT_DIV_EXPR:
> > +      if (TREE_CODE (TREE_OPERAND (expr, 1)) == INTEGER_CST)
> > +	{
> > +	  ret2 = tree_log2 (TREE_OPERAND (expr, 1));
> > +	  if (ret2 >= 0 && tree_int_cst_sgn (TREE_OPERAND (expr, 1)) == 1)
> 
> cheaper to test the sign first.

Ok.
> 
> > +	    {
> > +	      ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > +	      if (ret1 > ret2)
> > +		return ret1 - ret2;
> > +	    }
> > +	}
> > +      return 0;
> > +    CASE_CONVERT:
> > +      ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > +      if (ret1 && ret1 == TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (expr, 0))))
> > +	ret1 = prec;
> > +      return MIN (ret1, prec);
> > +    case SAVE_EXPR:
> > +      return tree_ctz (TREE_OPERAND (expr, 0));
> > +    case COND_EXPR:
> > +      ret1 = tree_ctz (TREE_OPERAND (expr, 1));
> > +      ret2 = tree_ctz (TREE_OPERAND (expr, 2));
> > +      return MIN (ret1, ret2);
> > +    case COMPOUND_EXPR:
> > +      return tree_ctz (TREE_OPERAND (expr, 1));
> > +    case ADDR_EXPR:
> > +      ret1 = get_object_alignment (TREE_OPERAND (expr, 0));
> 
> Use get_pointer_alignment, this isn't a memory reference so type
> alignment rules don't apply.

Ok.

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]