This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] Fix PR rtl-optimization/87727




On 12/20/2018 06:14 PM, Peter Bergner wrote:
On 12/20/18 4:41 PM, Jeff Law wrote:
On 12/20/18 2:30 PM, Peter Bergner wrote:
For stage1, I'd like to fix that conflict wart if I can.  I have also
wondered about adding a copy coalesce phase just before we enter RA,
which would ensure the copies are removed, instead of hoping RA assigns
the same reg to the source and destination of the copy making it a nop
that can be removed.
The difficulty with coalescing is that if you get too aggressive then
you end up removing degrees of freedom from the allocator and you can
easily make the final results worse.
I agree, but being too aggressive leading to bad decisions/code is
true for a lot of optimizations. :-)   I do plan on first attacking
the conservative conflict info for pseudos first and seeing what
that buys us before attempting any coalescing.
When I started to work on IRA, I've tried several coalescing techniques (i recall only conservative, iterative and optimistic ones).  The results were not promising.  But it was very long time ago,  my major target was i686 that time and there were no accurate conflict calculations for irregular file registers.  So may be it will work in current environment and in a different implementation.

Currently IRA has coalescing only for spilled pseudos after coloring (because mem<->mem moves are very expensive).  LRA has the same technique.

As for removing degrees of freedom for the allocator, sometimes that can
be a good thing, if it can makes the allocator simpler.  For example, I
think we have forced the allocator to do too much by not only being an RA,
but being an instruction selector as well.  Doing both RA and instruction
selection at the same time makes everything very complicated and I think
we probably don't compute allocation costs correctly, since we seem to
calculate costs on a per alternative per insn basis and I don't think we
ever see what the ramifications of using an alternative in one insn
has on the costs of another alternative in another insn.  Sometimes using
the cheapest alternative in one insn and the cheapest alternative in
another insn can lead us into a situation that requires spilling to
resolve the conflicting choices.
  I am completely agree.  The big remaining part to modernize GCC is code selection.  I believe LLVM has a big advantage in this area over GCC.  A modern approach could make RA much simpler.  But it is a very big job involving changes in machine descriptions (a lot of them).

  I don't mean machine description in IBURG style.  That would be a huge, enormous job requiring a lot of expertise part of which is lost for some targets (i was thinking about to start this jobs several times but gave up when I saw how many efforts it would take, it would be even a bigger job that writing IRA/LRA).

  I am just saying that you need at least have cost for each insn alternative (may be sub-targets).  Although some approximation can be possible (like insns number generated from the alternative or even their size).

  There are although some smaller projects in this direction.  For example, I tried to use code selection in register cost calculation (the code on ira-select branch).  The algorithm is based on choosing alternative for each insns first and then calculates costs and register classes for pseudos involved in the insn.  The chosen alternatives could be propagated later to LRA (this work even did not started yet).  The cost of each insn alternative (if we add them in the future in md files) could be easily integrated in the algorithm.

  Unfortunately the algorithm did not improve SPEC2006 for x86-64 (i7-8700k) in overall although one benchmark was improved by about 5% if I remember this correctly.  But modern Intel CPUs are very insensitive to optimizations (they are complicated black boxes which do own optimizations and anekdotically i saw code when adding an additional move sped up the code a lot).  May be the algorithm will have better results on other targets (power or aarch64).  I never tried other targets.
I've wondered if running something like lra_constraints() (but using
pseudos for fixups rather than hard regs) early in the rtl passes as
a pseudo instruction selection pass wouldn't make things easier for
the following passes like RA, etc?

I think it might. As wrote we could propagate the above algorithm decision to LRA.

Peter, also if you are interesting to do RA work, there is another problem which is to implement sub-register level conflict calculations in LRA.  Currently, IRA has a simple subregister level conflict calculation (see allocno objects) and in a case of sub-register presence IRA and LRA decisions are different and this results in worse code generations (there are some PRs for this).  It would be also a big RA project to do.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]