[patch] Fix PR rtl-optimization/87727

Fri Dec 21 16:03:00 GMT 2018

On Fri, Dec 21, 2018 at 4:25 PM Vladimir Makarov <vmakarov@redhat.com> wrote:
>
>
>
> On 12/20/2018 06:14 PM, Peter Bergner wrote:
> > On 12/20/18 4:41 PM, Jeff Law wrote:
> >> On 12/20/18 2:30 PM, Peter Bergner wrote:
> >>> For stage1, I'd like to fix that conflict wart if I can.  I have also
> >>> wondered about adding a copy coalesce phase just before we enter RA,
> >>> which would ensure the copies are removed, instead of hoping RA assigns
> >>> the same reg to the source and destination of the copy making it a nop
> >>> that can be removed.
> >> The difficulty with coalescing is that if you get too aggressive then
> >> you end up removing degrees of freedom from the allocator and you can
> >> easily make the final results worse.
> > I agree, but being too aggressive leading to bad decisions/code is
> > true for a lot of optimizations. :-)   I do plan on first attacking
> > the conservative conflict info for pseudos first and seeing what
> > that buys us before attempting any coalescing.
> When I started to work on IRA, I've tried several coalescing techniques
> (i recall only conservative, iterative and optimistic ones).  The
> results were not promising.  But it was very long time ago,  my major
> target was i686 that time and there were no accurate conflict
> calculations for irregular file registers.  So may be it will work in
> current environment and in a different implementation.
>
> Currently IRA has coalescing only for spilled pseudos after coloring
> (because mem<->mem moves are very expensive).  LRA has the same technique.
>
> > As for removing degrees of freedom for the allocator, sometimes that can
> > be a good thing, if it can makes the allocator simpler.  For example, I
> > think we have forced the allocator to do too much by not only being an RA,
> > but being an instruction selector as well.  Doing both RA and instruction
> > selection at the same time makes everything very complicated and I think
> > we probably don't compute allocation costs correctly, since we seem to
> > calculate costs on a per alternative per insn basis and I don't think we
> > ever see what the ramifications of using an alternative in one insn
> > has on the costs of another alternative in another insn.  Sometimes using
> > the cheapest alternative in one insn and the cheapest alternative in
> > another insn can lead us into a situation that requires spilling to
> > resolve the conflicting choices.
>    I am completely agree.  The big remaining part to modernize GCC is
> code selection.  I believe LLVM has a big advantage in this area over
> GCC.  A modern approach could make RA much simpler.  But it is a very
> big job involving changes in machine descriptions (a lot of them).
>
>    I don't mean machine description in IBURG style.  That would be a
> huge, enormous job requiring a lot of expertise part of which is lost
> for some targets (i was thinking about to start this jobs several times
> but gave up when I saw how many efforts it would take, it would be even
> a bigger job that writing IRA/LRA).
>
>    I am just saying that you need at least have cost for each insn
> alternative (may be sub-targets).  Although some approximation can be
> possible (like insns number generated from the alternative or even their
> size).
>
>    There are although some smaller projects in this direction.  For
> example, I tried to use code selection in register cost calculation (the
> code on ira-select branch).  The algorithm is based on choosing
> alternative for each insns first and then calculates costs and register
> classes for pseudos involved in the insn.  The chosen alternatives could
> be propagated later to LRA (this work even did not started yet).  The
> cost of each insn alternative (if we add them in the future in md files)
> could be easily integrated in the algorithm.
>
>    Unfortunately the algorithm did not improve SPEC2006 for x86-64
> (i7-8700k) in overall although one benchmark was improved by about 5% if
> I remember this correctly.  But modern Intel CPUs are very insensitive
> to optimizations (they are complicated black boxes which do own
> optimizations and anekdotically i saw code when adding an additional
> move sped up the code a lot).  May be the algorithm will have better
> results on other targets (power or aarch64).  I never tried other targets.
> > I've wondered if running something like lra_constraints() (but using
> > pseudos for fixups rather than hard regs) early in the rtl passes as
> > a pseudo instruction selection pass wouldn't make things easier for
> > the following passes like RA, etc?
> >
> I think it might. As wrote we could propagate the above algorithm
> decision to LRA.
>
> Peter, also if you are interesting to do RA work, there is another
> problem which is to implement sub-register level conflict calculations
> in LRA.  Currently, IRA has a simple subregister level conflict
> calculation (see allocno objects) and in a case of sub-register presence
> IRA and LRA decisions are different and this results in worse code
> generations (there are some PRs for this).  It would be also a big RA
> project to do.

A further away (in pass distance) but maybe related project is to
replace the current "instruction selection" (I'm talking about RTL
expansion) with a scheme that works on (GIMPLE) SSA.  My
rough idea for prototyping pieces would be to first do this
completely on GIMPLE by replacing a "instruction" by
a GIMPLE asm with an "RTL" body (well, that doesn't have to
be explicit, it just needs to remember the insn chosen). The
available patterns are readily available in the .md files, we
just need some GIMPLE <-> RTL translation of the operations.

In the end this would do away with our named patterns
for expansion purposes.

Richard.

>