This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix PR46556 (poor address generation)


Hi Richard,

Thanks for the comments -- a few responses below.

On Tue, 2011-10-11 at 13:40 +0200, Richard Guenther wrote:
> On Sat, 8 Oct 2011, William J. Schmidt wrote:
> 

<snip>

> 
> > +  c4 = uhwi_to_double_int (bitpos / BITS_PER_UNIT);
> 
> You don't verify that bitpos % BITS_PER_UNIT is zero anywhere.

I'll add a check in the caller.  I was thinking this was unnecessary
since I had excluded bitfield operations, but on reflection that may not
be sufficient.

<snip>

> 
> > +  mult_expr = force_gimple_operand_gsi (gsi, mult_expr, true, NULL,
> > +					true, GSI_SAME_STMT);
> > +  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
> > +  add_expr = force_gimple_operand_gsi (gsi, add_expr, true, NULL,
> > +				       true, GSI_SAME_STMT);
> > +  mem_ref = fold_build2 (MEM_REF, TREE_TYPE (*expr), add_expr,
> > +			 build_int_cst (offset_type, double_int_to_shwi (c)));
> 
> double_int_to_tree (offset_type, c)
> 
> Please delay gimplification to the caller, that way this function
> solely operates on the trees returned from get_inner_reference.
> Or are you concerned that fold might undo your association?

I'll try that.  I was just basing this on some suggestions you had made
earlier; I don't believe there is any problem with delaying it.

<snip>

> >  
> >    for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
> >      {
> >        gimple stmt = gsi_stmt (gsi);
> >  
> > -      if (is_gimple_assign (stmt)
> > -	  && !stmt_could_throw_p (stmt))
> > +      /* During late reassociation only, look for restructuring
> > +	 opportunities within an expression that references memory.
> > +	 We only do this for blocks not contained in loops, since the
> > +	 ivopts machinery does a good job on loop expressions, and we
> > +	 don't want to interfere with other loop optimizations.  */
> 
> I'm not sure I buy this.  IVOPTs would have produced [TARGET_]MEM_REFs
> which you don't handle.  Did you do any measurements what happens if
> you enable it generally?

Actually I agree with you -- in an earlier iteration this was still
enabled for reassoc1 ahead of loop optimizations and was causing
degradations.  So long as it doesn't occur early it should be fine to do
everywhere now, and catch non-ivar cases in loops.

<snip>

> You verified the patch has no performance degradations on ppc64
> for SPEC CPU, did you see any improvements?
> 

Yes, a few in the 2-3% range.  Nothing stellar.

> The pattern matching is still very ad-hoc and doesn't consider
> statements that feed the base address.  There is conceptually
> no difference between p->a[n] and *(p + n * 4).

That's true.  Since we abandoned the general address-lowering approach,
this was aimed at the specific pattern that comes up frequently in
practice.  I would expect the *(p + n * 4) cases to be handled by the
general straight-line strength reduction, which is the correct long-term
approach.  (Cases like p->a[n], where the multiplication is not yet
explicit, will be a bit of a wart as part of strength reduction, too,
but that's still the right place for it eventually.)

>   You placed this
> lowering in reassoc to catch CSE opportunities with DOM, right?
> Does RTL CSE not do it's job or is the transform undone by
> fwprop before it gets a chance to do it?

I think with Paolo's suggested patch for RTL CSE, this could be moved
back to expand.  I will have to experiment with it again to make sure.
If so, that would certainly be my preference as well.

(Or having the whole problem just disappear might be my preference on
some days... :)

Thanks,
Bill


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]