[PATCH][RFC] Add FRE in pass_vectorize

Thu Jul 2 17:52:00 GMT 2015

On 07/02/2015 05:40 AM, Alan Lawrence wrote:
> Jeff Law wrote:
>> On 06/24/2015 01:59 AM, Richard Biener wrote:
>>> And then there is the possibility of making passes generate less
>>> needs to perform cleanups after them - like in the present case
>>> with the redundant IVs make them more appearant redundant by
>>> CSEing the initial value and step during vectorizer code generation.
>>> I'm playing with the idea of adding a simple CSE machinery to
>>> the gimple_build () interface (aka match-and-simplify).  It
>>> eventually invokes (well, not currently, but that can be fixed)
>>> maybe_push_res_to_seq which is a good place to maintain a
>>> table of already generated expressions.  That of course only
>>> works if you either always append to the same sequence or at least
>>> insert at the same place.
>> As you know we've gone back and forth on this in the past.  It's
>> always a trade-off.  I still ponder from time to time putting the
>> simple CSE and cprop bits back into the SSA rewriting phase to avoid
>> generating all kinds of garbage that just needs to be cleaned up later
>> -- particularly for incremental SSA updates.
>
> Coming to this rather late, and without the background knowledge about
> having gone back and forth, sorry! But what are the arguments against
> this? Am I right in thinking that the "SSA Rewriting" phase would not
> trigger as often as gimple_build(), or are these the same thing?
It's the into-ssa and incremental update phases.  The basic idea is it 
is very inexpensive to do const/copy propagation and simple CSE at that 
point.

When processing an assignment, after rewriting the inputs from _DECL 
nodes to SSA_NAMEs, you lookup the RHS in your hash table.  If you get a 
hit, you replace the expression with the SSA_NAME from the hash table 
and record that the destination has an equivalence.

Diego took this out several years ago with the idea that the into-ssa & 
updates should be kept separate from optimizations.  With the ongoing 
need for early cleanups to make IPA more effective, I think it's time to 
revisit that decision as we get a lot of the obvious redundancies out of 
the stream by just being smart during into-ssa.  Which in turn means we 
don't have to do as much in the early optimizations before IPA.

>
> Presumably when you say "simple CSE machinery" you'd have to bail out
> quickly from tricky cases like, say:
>
> if (P)
>    {
>      use ...expr...
>    }
> ...
> if (Q)
>    {
>      now building a new ...expr... here
>    }
Not sure the problem here.  The simple CSE/cprop occurs as we're going 
into SSA form -- because into-ssa is inherently a dominator walk and 
we're rewriting operands as we go, we can trivially determine that we've 
already seen a given expression earlier in the dominator tree and that 
the result of that expression hasn't changed (by the nature of SSA).

Jeff