[PATCH]: Rewrite reassociation pass

Daniel Berlin dberlin@dberlin.org
Wed Dec 7 03:21:00 GMT 2005


On Tue, 2005-12-06 at 16:24 -0700, Jeffrey A Law wrote:
> On Mon, 2005-12-05 at 11:29 -0500, Daniel Berlin wrote:
> > Just to help things along, i've attached the newer version of the patch
> > with the other things you mentioned (long lines, etc), fixed.
> > 
> > The changelog is still the same.
> > I've again attached tree-ssa-reassoc.c as a seperate file, since the
> > diff isn't really helpful :)
> Thanks.    The only thing left for me to really sit down and understand
> is the linearization code. 

The key is to realize that it only linearizes properly because both
functions (linearize_expr_tree and linearize_expr) know that the other
will take care of one of the sides by the rest of its recursion.

It's also the case that linearize_expr_tree may recurse on something
that ends up having been totally linearized already, but we don't know
that until we hit the bottom and work our way back up (even if we've
called linearize_expr on it).  



>  The rest makes pretty good sense to me.
> 
> A nit I noticed, several loops have the form:
> 
> 
>   for (i = currindex;
>        VEC_iterate (operand_entry_t, *ops, i, oe)
>        && oe->rank >= curr->rank - 1;
>        i++)
>     {
>       if (oe->op == notop && i != currindex)
> 
> Seems to me like you could change the loop initialization ot
> i = currindex + 1 and eliminate the i != currindex test within
> the loop.  I doubt it matters in any significant way.  Your call.

I'll happily do so.

It used to be the case (in a version never seen by humans :P) that the
seperate functions "eliminate_not_pairs", "eliminate_plus_minus_pair",
and "eliminate_using_constants" were all one big function with one big
loop where it actually had to keep the "current index" and iteration
counter seperate for some reason.

When i functionized them, i just made the obvious transformation and
didn't notice it could be improved.

> 
> It looks like you've still got the commented out call to
> rewrite_expr_tree_new in the code.

Crap, you are right.

> We can't use this (as-is) to optimize not-not (and neg-neg)
> expressions that are currently handled by DOM -- unless the
> not-not feeds into some associative operation.    I haven't
> looked into how difficult it might be to do that.

The problem with extending the code to handle multiple types of ops at
once is obviously that then you have to pay attention to ordering, etc.
It's certainly possible, but it's not clear how much of a maintenance
burden it would be.
> 
> I did notice during the evaluation against DOM's reassociation
> that this code would benefit from running after DCE/DSE.  Some
> opportunities to reassociate and simplify are being blocked
> by dead code (and thus dead uses of key SSA_NAMEs).

Yes, it can also generate dead code on it's own.  In my original tree,
it ran after split crit edges, before PRE, and then had another DCE
after it.

> 
> But possibly more importantly, the loop optimizers appear to
> create/expose a number of new simplification opportunities

They do, yes.
I never had a FRE/PRE pass after loop, so I didn't try to put reassoc
there, but thinking about it, it probably does make sense.


> For example, in my test bucket we only get 1 simplification 
> a PLUS/MINUS pair and 2 simplifications of {BIT_XOR,PLUS,MINUS}_EXPR
> with a constant zero with the current placement of reassociation.
> If reassociation is moved after the loop optimizers we get 27
> and 28 simplifications respectively.  What I haven't tried to measure
> is how many new CSE opportunities are exposed vs any that are lost.


> 
> [ Noticed, of course, because DOM's code caught some of those cases
>   during its final run. ]
> 
> I realize reassociation creates dead code.  DOM is relatively immune
> to dead code and we run CDDCE after the final DOM pass which would
> zap the dead code created by reassociation.

> 
> Thoughts?

Reassociation is quite a cheap pass.  It may even be sane to run it
twice, once before PRE before loop, and once after loop.

The most expensive part is computing the post dominators to get the
ordering right to break up subtracts.  If you don't tell it to break up
subtracts, (or just do it in RPO and hope for the best there) we could
probably just run it twice without it even showing up in the time
output.

> jeff
> 
> 
> 
> 
> 
> 
> 



More information about the Gcc-patches mailing list