[PATCH][RFC] Add FRE in pass_vectorize

Wed Jun 24 08:16:00 GMT 2015

On Tue, 23 Jun 2015, Jeff Law wrote:

> On 06/10/2015 08:02 AM, Richard Biener wrote:
> > 
> > The following patch adds FRE after vectorization which is needed
> > for IVOPTs to remove redundant PHI nodes (well, I'm testing a
> > patch for FRE that will do it already there).
> Redundant or degenerates which should be propagated?

Redundant, basically two IVs with the same initial value and same step.
IVOPTs can deal with this if the initial values and the step are already
same "enough" - the vectorizer can end up generating redundant huge
expressions for both.

> I believe Alan Lawrence has run into similar issues (unpropagated degenerates)
> with his changes to make loop header copying more aggressive.  Threading will
> also create them.  The phi-only propagator may be the solution.  It ought to
> be cheaper than FRE.

Yes, but that's unrelated (see above).

> > The patch also makes FRE preserve loop-closed SSA form and thus
> > make it suitable for use in the loop pipeline.
> Loop optimizations will tend to create opportunities for redundancy
> elimination, so the ability to use FRE in the loop pipeline seems like a good
> thing.  We ran into this in RTL land, so I'm not surprised to see it occurring
> in the gimple optimizers and thus I'm not opposed to running FRE in the loop
> pipeline.
> 
> 
> 
> > 
> > With the placement in the vectorizer sub-pass FRE will effectively
> > be enabled by -O3 only (well, or if one requests loop vectorization).
> > I've considered placing it after complete_unroll instead but that
> > would enable it at -O1 already.  I have no strong opinion on the
> > exact placement, but it should help all passes between vectorizing
> > and ivopts for vectorized loops.
> For -O3/vectorization it seems like a no-brainer.  -O1 less so.  IIRC we
> conditionalize -frerun-cse-after-loop on -O2 which seems more appropriate than
> doing it with -O1.
> 
> > 
> > Any other suggestions on pass placement?  I can of course key
> > that FRE run on -O3 explicitely.  Not sure if we at this point
> > want to start playing fancy games like setting a property
> > when a pass (likely) generated redundancies that are worth
> > fixing up and then key FRE on that one (it gets harder and
> > less predictable what transforms are run on code).
> RTL CSE is bloody expensive and so many times I wanted the ability to know a
> bit about what the loop optimizer had done (or not done) so that I could
> conditionally skip the second CSE pass.   We never built that, but it's
> something I've wanted for decades.

Hmm, ok.  We can abuse pass properties for this but I don't think
they are a scalable fit.  Not sure if we'd like to go full way
adding sth like PROP_want_ccp PROP_want_copyprop PROP_want_cse, etc.
(any others?).  And whether FRE would then catch a PROP_want_copyprop
because it also can do copy propagation.

Eventually we'll just end up setting PROP_want_* from every pass...
(like we schedule a CFG cleanup from nearly every pass that did
anything).

Going a bit further here, esp. in the loop context, would be to
have the basic cleanups be region-based.  Because given a big
function with many loops and just one vectorized it would be
enough to cleanup the vectorized loop (yes, and in theory
all downstream effects, but that's probably secondary and not
so important).  It's not too difficult to make FRE run on
a MEME region, the interesting part, engineering-wise, is to
really make it O(size of MEME region) - that is, eliminate
things like O(num_ssa_names) or O(n_basic_blocks) setup cost.

And then there is the possibility of making passes generate less
needs to perform cleanups after them - like in the present case
with the redundant IVs make them more appearant redundant by
CSEing the initial value and step during vectorizer code generation.
I'm playing with the idea of adding a simple CSE machinery to
the gimple_build () interface (aka match-and-simplify).  It
eventually invokes (well, not currently, but that can be fixed)
maybe_push_res_to_seq which is a good place to maintain a
table of already generated expressions.  That of course only
works if you either always append to the same sequence or at least
insert at the same place.

I'm now back to match-and-simplify and will pursue that last idea
a bit (also wanting it for SCCVN itself).

Richard.

> Jeff
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)