This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RS6000] bswapdi2 pattern, reload and lra


On Tue, Dec 17, 2013 at 6:50 AM, Alan Modra <amodra@gmail.com> wrote:
> This patch is aimed at fixing test failures introduced by my
> 2013-12-07 change to bswapdi2_32bit:
> FAIL: gcc.target/powerpc/pr53199.c scan-assembler-times lwbrx 6
> FAIL: gcc.target/powerpc/pr53199.c scan-assembler-times stwbrx 6
>
> The 2013-12-07 change was necessary to make -m32 -mlra generate good
> code for pr53199.c:reg_reverse.  Too many '?'s on the r->r alternative
> result in lra never choosing that option.  Instead we get Z->r,
> ie. storing the input reg to a stack temp then using lwbrx from there.
> That means we have a load-hit-store flush with a severe slowdown.
> (See http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00002.html for the
> corresponding -m64 result, a 4x slowdown.)
>
> A similar problem occurs with -m64 -mcpu=power7 -mlra due to
> bswapdi2_ldbrx having two '?'s on r->r.
>
> To fix this I ran into a whole lot of pain.  reload and lra are quite
> different in their selection of insn alternatives.  I could not find
> any combination of '?'s that generated the best code for both reload
> an lra on pr53199.c.  To see why, it's necessary to look (from a great
> height) at how reload chooses amongst alternatives.  A particular
> alternative gets a "loser" score, with the lowest "loser" being
> chosen.  "loser" is incremented
> a) when an operand does not match its constraint.
> b) when an alternative has a '?' in the constraint.
> c) when a scratch register is required.
> d) when an early clobber output clashes with one of the inputs.
>
> a) is fairly obvious.  For example, if we have a MEM when the operand
>    alternative needs a reg, then we'll require a reload.
> b) is also quite obvious.  Multiple '?'s accumulate.
> c) is a little more subtle.  It bites you when alternatives require
>    differing numbers of scratch registers.  Take for example
>    bswapdi2_64bit, which before this patch has three alternatives
>    Z->r with 3 scratch regs, (Z is a subset of m)
>    r->Z with 2 scratch regs,
>    r->r with 3 scratch regs.
>    All other things being equal, with reload you could correct this
>    disparity by adding a '?' to the r->Z alternative.  We might want
>    to do that so that Z->r and r->r are the same "distance" apart
>    as r->Z is from r->r.  With lra it seems that scratch regs are
>    weighted differently..
> d) is also tricky, and a trap for anyone optimizing insn selection for
>    functions like some in pr53199.c that have just one rtl insn with
>    early clobbers.  PowerPC generally returns function results in the
>    same register as the first argument, so these hit the early
>    clobber.  Code elsewhere in larger functions probably won't.
>    lra penalizes early clobbers differently to reload (a lot less).
>
> So, putting this all together..  Point (d) test implication is covered
> by the additional functions in pr53199.c.  Avoiding early clobbers
> where possible is good since it reduces differences between reload and
> lra insn alternative costing.  We also generate better code.  I
> managed to do this for all the Z->r bswapdi patterns, but stopped
> short of changing r->r as well since at that point everything looked
> OK!  Avoiding extra scratch registers for one alternative is also good.
> The op4 r->r scratch wasn't even used, and op4 for Z->r fairly easy to
> do without.  Renaming word_high/low to word1/2 was to make it a little
> easier to track lifetime of addr1/2.
>
> Bootstrapped and regression tested powerpc64-linux.  Output of
> pr53199.c inspected for sanity with -mcpu=power{6,7} -m{32,64} and
> {-mlra,}.  OK to apply?
>
> gcc/
>         * config/rs6000/rs6000.md (bswapdi2): Remove one scratch reg.
>         Modify Z->r bswapdi splitter to use dest in place of scratch.
>         In r->Z and Z->r bswapdi splitter rename word_high, word_low
>         to word1, word2 and rearrange logic to suit.
>         (bswapdi2_64bit): Remove early clobber on Z->r alternative.
>         (bswapdi2_ldbrx): Likewise.  Remove '??' on r->r.
>         (bswapdi2_32bit): Remove early clobber on Z->r alternative.
>         Add one '?' on r->r.  Modify Z->r splitter to avoid need for
>         early clobber.
> gcc/testsuite/
>         * gcc.target/powerpc/pr53199.c: Add extra functions.

Okay.

Thanks, David


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]