[RFC] split pseudos during loop unrolling in RTL unroller
Segher Boessenkool
segher@kernel.crashing.org
Thu Apr 23 20:16:53 GMT 2020
On Thu, Apr 23, 2020 at 08:40:50AM -0600, Jeff Law wrote:
> On Thu, 2020-04-23 at 15:07 +0200, Richard Biener wrote:
> > On Thu, Apr 23, 2020 at 2:52 PM Segher Boessenkool
> > <segher@kernel.crashing.org> wrote:
> > > On Thu, Apr 23, 2020 at 02:25:40PM +0200, Richard Biener wrote:
> > > > > > But being stuck with something means no progress... I know
> > > > > > very well it's 100 times harder to get rid of something than to
> > > > > > add something new ontop.
> > > > >
> > > > > Well, what progress do you expect to make? After expand that is :-)
> > > >
> > > > I'd like the RTL pipeline before RA to shrink significantly, no PRE,
> > > > no CSE, ...
> > >
> > > RTL CSE for example is very much required to get any good code. It
> > > needs to CSE stuff that wasn't there before expand.
> >
> > Sure, but then we should fix that!
> Exactly. It's purpose largely becomes dealing with the redundancies exposed by
> expansion. ie, address arithmetic and the like. A lot of its path following
> code should be throttled back.
Hrm, I never thought about it like this. CSE was always there, I never
stopped to question if we needed it :-)
Well, that's cse1 then. What about cse2?
> > But valid RTL is instructions that are recognized. Which means
> > when the target doesn't support an SImode add we may not create
> > one. That's instruction selection ;)
> That's always a point of tension. But I think that in general continuing to have
> targets claim to support things they do not (such as double-wordsize arithmetic,
> logicals, moves, etc) is a mistake. It made sense at one time, but I think we've
> got better mechansisms in place to deal with this stuff now.
Different targets have *very* different insns for add, mul, div, shifts;
everything really. Describing this at expand time with two-machine-word
operations works pretty bloody well, for most or all targets -- this is
just part of the power of define_expand (but an important part). And
define_expand is very very useful, it's the swiss army escape hatch, it
lets you do everything optabs have a too small mind for.
> > > Oh no, I think we should do more earlier, and GIMPLE is a fine IR for
> > > there. But for low-level, close-to-the-machine stuff, RTL is much
> > > better suited. And we *do* want to optimise at that level as well, and
> > > much more than just peepholes.
> >
> > Well, everything that requires costing (unrolling, vectorization,
> > IV selection to name a few) _is_ close-to-the-machine. We're
> > just saying they are not because GIMPLE is so much easier to
> > work with here (not sure why exactly...).
> The primary motivation behind discouraging target costing and the like from
> gimple was to make it easier to implement and predict the behavior of the gimple
> optimizers. We've relaxed that somewhat, particularly for vectorization, but I
> think the principle is still solid.
There are two kinds of costing. The first only says which of A or B is
better; that can perhaps be done on GIMPLE already, using
target-specific costs. The other gives a number to everything, which is
much harder to get anywhere close to usably correct (what does the
number even *mean*? For performance, latency of the whole sequence is
the most important number, but that is not easy to work with, or what we
use for say insn_cost).
>
> But I think there is a place for adding target dependencies -- and that's at the
> end of the current gimple pipeline.
There are a *few* things in GIMPLE that use target costs (ivopts...)
But yeah, most things should not.
Segher
More information about the Gcc-patches
mailing list