[RFC] split pseudos during loop unrolling in RTL unroller

Thu Apr 23 20:16:53 GMT 2020

On Thu, Apr 23, 2020 at 08:40:50AM -0600, Jeff Law wrote:
> On Thu, 2020-04-23 at 15:07 +0200, Richard Biener wrote:
> > On Thu, Apr 23, 2020 at 2:52 PM Segher Boessenkool
> > <segher@kernel.crashing.org> wrote:
> > > On Thu, Apr 23, 2020 at 02:25:40PM +0200, Richard Biener wrote:
> > > > > > But being stuck with something means no progress...  I know
> > > > > > very well it's 100 times harder to get rid of something than to
> > > > > > add something new ontop.
> > > > > 
> > > > > Well, what progress do you expect to make?  After expand that is :-)
> > > > 
> > > > I'd like the RTL pipeline before RA to shrink significantly, no PRE,
> > > > no CSE, ...
> > > 
> > > RTL CSE for example is very much required to get any good code.  It
> > > needs to CSE stuff that wasn't there before expand.
> > 
> > Sure, but then we should fix that!
> Exactly.  It's purpose largely becomes dealing with the redundancies exposed by
> expansion.  ie, address arithmetic and the like.   A lot of its path following
> code should be throttled back.

Hrm, I never thought about it like this.  CSE was always there, I never
stopped to question if we needed it :-)

Well, that's cse1 then.  What about cse2?

> > But valid RTL is instructions that are recognized.  Which means
> > when the target doesn't support an SImode add we may not create
> > one.  That's instruction selection ;)
> That's always a point of tension.  But I think that in general continuing to have
> targets claim to support things they do not (such as double-wordsize arithmetic,
> logicals, moves, etc) is a mistake.  It made sense at one time, but I think we've
> got better mechansisms in place to deal with this stuff now.

Different targets have *very* different insns for add, mul, div, shifts;
everything really.  Describing this at expand time with two-machine-word
operations works pretty bloody well, for most or all targets -- this is
just part of the power of define_expand (but an important part).  And
define_expand is very very useful, it's the swiss army escape hatch, it
lets you do everything optabs have a too small mind for.

> > > Oh no, I think we should do more earlier, and GIMPLE is a fine IR for
> > > there.  But for low-level, close-to-the-machine stuff, RTL is much
> > > better suited.  And we *do* want to optimise at that level as well, and
> > > much more than just peepholes.
> > 
> > Well, everything that requires costing (unrolling, vectorization,
> > IV selection to name a few) _is_ close-to-the-machine.  We're
> > just saying they are not because GIMPLE is so much easier to
> > work with here (not sure why exactly...).
> The primary motivation behind discouraging target costing and the like from
> gimple was to make it easier to implement and predict the behavior of the gimple
> optimizers.   We've relaxed that somewhat, particularly for vectorization, but I
> think the principle is still solid.

There are two kinds of costing.  The first only says which of A or B is
better; that can perhaps be done on GIMPLE already, using
target-specific costs.  The other gives a number to everything, which is
much harder to get anywhere close to usably correct (what does the
number even *mean*?  For performance, latency of the whole sequence is
the most important number, but that is not easy to work with, or what we
use for say insn_cost).

> 
> But I think there is a place for adding target dependencies -- and that's at the
> end of the current gimple pipeline.

There are a *few* things in GIMPLE that use target costs (ivopts...)
But yeah, most things should not.

Segher