[PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation

Richard Earnshaw Richard.Earnshaw@foss.arm.com
Wed Aug 12 08:32:00 GMT 2015


On 12/08/15 02:11, Richard Henderson wrote:
> Something last week had me looking at ppc64 code generation,
> and some of what I saw was fairly bad.  Fixing it wasn't going
> to be easy, due to the fact that the logic for generating
> constants wasn't contained within a single function.
> 
> Better is the way that aarch64 and alpha have done it in the
> past, sharing a single function with all of the logical that
> can be used for both cost calculation and the actual emission
> of the constants.
> 
> However, the way that aarch64 and alpha have done it hasn't
> been ideal, in that there's a fairly costly search that must
> be done every time.  I've thought before about changing this
> so that we would be able to cache results, akin to how we do
> it in expmed.c for multiplication.
> 
> I've implemented such a caching scheme for three targets, as
> a test of how much code could be shared.  The answer appears
> to be about 100 lines of boiler-plate.  Minimal, true, but it
> may still be worth it as a way of encouraging backends to do
> similar things in a similar way.
> 

I've got a short week this week, so won't have time to look at this in
detail for a while.  So a bunch of questions... but not necessarily
objections :-)

How do we clear the cache, and when?  For example, on ARM, switching
between ARM and Thumb state means we need to generate potentially
radically different sequences?  We can do such splitting at function
boundaries now.

Can we generate different sequences for hot/cold code within a single
function?

Can we cache sequences with the context (eg use with AND, OR, ADD, etc)?


> Some notes about ppc64 in particular:
> 
>   * Constants aren't split until quite late, preventing all hope of
>     CSE'ing portions of the generated code.  My gut feeling is that
>     this is in general a mistake, but...
> 
>     I did attempt to fix it, and got nothing for my troubles except
>     poorer code generation for AND/IOR/XOR with non-trivial constants.
> 
On AArch64 in particular, building complex constants is generally
destructive on the source register (if you want to preserve intermediate
values you have to make intermediate copies); that's clearly never going
to be a win if you don't need at least 3 instructions to form the
constant.

There might be some cases where you could form a second constant as a
difference from an earlier one, but that then creates data-flow
dependencies and in OoO machines that might not be worth-while.  Even
for in-order machines it can restrict scheduling and result in worse code.


>     I'm somewhat surprised that the operands to the logicals aren't
>     visible at rtl generation time, given all the work done in gimple.
>     And failing that, combine has enough REG_EQUAL notes that it ought
>     to be able to put things back together and see the simpler pattern.
> 

We've tried it in the past.  Exposing the individual steps prevents the
higher-level rtl-based optimizations since they can no-longer deal with
the complete sub-expression.

>     Perhaps there's some other predication or costing error that's
>     getting in the way, and it simply wasn't obvious to me.   In any
>     case, nothing in this patch set addresses this at all.
> 
>   * I go on to add 4 new methods of generating a constant, each of
>     which typically saves 2 insns over the current algorithm.  There
>     are a couple more that might be useful but...
> 
>   * Constants are split *really* late.  In particular, after reload.
>     It would be awesome if we could at least have them all split before
>     register allocation so that we arrange to use ADDI and ADDIS when
>     that could save a few instructions.  But that does of course mean
>     avoiding r0 for the input.  Again, nothing here attempts to change
>     when constants are split.
> 

certainly in the ARM port we try to split immediately before register
allocation, that way we can be sure that we have scratch registers
available if that helps with generating more efficient sequences.

R.

>   * This is the only platform for which I bothered collecting any sort
>     of performance data:
> 
>     As best I can tell, there is a 9% improvement in bootstrap speed
>     for ppc64.  That is, 10 minutes off the original 109 minute build.
> 
>     For aarch64 and alpha, I simply assumed there would be no loss,
>     since the basic search algorithm is unchanged for each.
> 
> Comments?  Especially on the shared header?
> 
> 
> r~
> 
> Cc: David Edelsohn <dje.gcc@gmail.com>
> Cc: Marcus Shawcroft <marcus.shawcroft@arm.com>
> Cc: Richard Earnshaw <richard.earnshaw@arm.com>
> 
> Richard Henderson (15):
>   rs6000: Split out rs6000_is_valid_and_mask_wide
>   rs6000: Make num_insns_constant_wide static
>   rs6000: Tidy num_insns_constant vs CONST_DOUBLE
>   rs6000: Implement set_const_data infrastructure
>   rs6000: Move constant via mask into build_set_const_data
>   rs6000: Use rldiwi in constant construction
>   rs6000: Generalize left shift in constant generation
>   rs6000: Generalize masking in constant generation
>   rs6000: Use xoris in constant construction
>   rs6000: Use rotldi in constant generation
>   aarch64: Use hashing infrastructure for generating constants
>   aarch64: Test for duplicated 32-bit halves
>   alpha: Use hashing infrastructure for generating constants
>   alpha: Split out alpha_cost_set_const
>   alpha: Remove alpha_emit_set_long_const
> 
>  gcc/config/aarch64/aarch64.c      | 463 ++++++++++++++++------------
>  gcc/config/alpha/alpha.c          | 583 +++++++++++++++++------------------
>  gcc/config/rs6000/rs6000-protos.h |   1 -
>  gcc/config/rs6000/rs6000.c        | 617 ++++++++++++++++++++++++--------------
>  gcc/config/rs6000/rs6000.md       |  15 -
>  gcc/genimm-hash.h                 | 122 ++++++++
>  6 files changed, 1057 insertions(+), 744 deletions(-)
>  create mode 100644 gcc/genimm-hash.h
> 



More information about the Gcc-patches mailing list