[Bug tree-optimization/80155] [7/8/9 regression] Performance regression with code hoisting enabled

Tue May 22 10:14:00 GMT 2018

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155

--- Comment #37 from bin cheng <amker at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #36)
> On Tue, 22 May 2018, amker at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155
> > 
> > bin cheng <amker at gcc dot gnu.org> changed:
> > 
> >            What    |Removed                     |Added
> > ----------------------------------------------------------------------------
> >                  CC|                            |amker at gcc dot gnu.org
> > 
> > --- Comment #35 from bin cheng <amker at gcc dot gnu.org> ---
> > (In reply to prathamesh3492 from comment #33)
> > > Created attachment 42341 [details]
> > > Test-case to reproduce regression with cortex-m7
> > > 
> > > I have attached an artificial test-case that is fairly representative of the
> > > regression we are seeing in a benchmark. The test-case mimics a
> > > deterministic finite automaton. With code-hoisting there's an additional
> > > spill of r5 near beginning of the function.
> > > 
> > ...
> > > 
> > > Without code-hoisting it is reusing r3 to store a + 1, while due to code
> > > hoisting it uses the extra register 'r2' to store the value of hoisted
> > > expression a + 1.
> > > 
> > > Would it be a good idea to somehow "limit" the distance (in terms of number
> > > of basic blocks maybe?) between the definition of hoisted variable and it's
> > > furthest use during PRE ? If that exceeds a certain threshold then PRE
> > > should choose not to hoist that expression. The threshold could be a param
> > > that can be set by backends.
> > > Does this analysis look reasonable ?
> > 
> > It might be more accurate to calculate register pressure and use that to guide
> > code hoisting.  I introduced register pressure hoisting for RTL under option
> > -fira-hoist-pressure, basically similar thing needs to be done here.
> > 
> > The proposed Tree-SSA register pressure patch set is still under review, but
> > please note it only does minimal now by only computing register pressure.  To
> > make it useful in this case, it may need to be improved by
> > calculating/recording live range for statements (I did that in previous version
> > patch).  We would also need interfaces updating live range information in line
> > with code motion.
> 
> One important thing on GIMPLE is that stmt order (inside a BB at least)
> is quite arbitrary and thus LIVE should consider the stmts ordered
> in LIVE-optimal way to not introduce too much noise.  It might be that
> we should only consider live-through [loops] for heuristics in some 
> places plus the obvious changes in liveness that transforms induce.

Yes, I actually did experiments only counting live ranges at bb in/out when
computing max pressure for the current implementation.  It doesn't make much
difference for the only use in predcom, but could be important for case like
this one.