This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RS6000] bswapdi2 pattern, reload and lra
- From: David Edelsohn <dje dot gcc at gmail dot com>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>, Alan Modra <amodra at gmail dot com>
- Date: Sun, 15 Mar 2015 22:51:02 -0400
- Subject: Re: [RS6000] bswapdi2 pattern, reload and lra
- Authentication-results: sourceware.org; auth=none
- References: <20131217115034 dot GB28733 at bubble dot grove dot modra dot org>
On Tue, Dec 17, 2013 at 6:50 AM, Alan Modra <amodra@gmail.com> wrote:
> This patch is aimed at fixing test failures introduced by my
> 2013-12-07 change to bswapdi2_32bit:
> FAIL: gcc.target/powerpc/pr53199.c scan-assembler-times lwbrx 6
> FAIL: gcc.target/powerpc/pr53199.c scan-assembler-times stwbrx 6
>
> The 2013-12-07 change was necessary to make -m32 -mlra generate good
> code for pr53199.c:reg_reverse. Too many '?'s on the r->r alternative
> result in lra never choosing that option. Instead we get Z->r,
> ie. storing the input reg to a stack temp then using lwbrx from there.
> That means we have a load-hit-store flush with a severe slowdown.
> (See http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00002.html for the
> corresponding -m64 result, a 4x slowdown.)
>
> A similar problem occurs with -m64 -mcpu=power7 -mlra due to
> bswapdi2_ldbrx having two '?'s on r->r.
>
> To fix this I ran into a whole lot of pain. reload and lra are quite
> different in their selection of insn alternatives. I could not find
> any combination of '?'s that generated the best code for both reload
> an lra on pr53199.c. To see why, it's necessary to look (from a great
> height) at how reload chooses amongst alternatives. A particular
> alternative gets a "loser" score, with the lowest "loser" being
> chosen. "loser" is incremented
> a) when an operand does not match its constraint.
> b) when an alternative has a '?' in the constraint.
> c) when a scratch register is required.
> d) when an early clobber output clashes with one of the inputs.
>
> a) is fairly obvious. For example, if we have a MEM when the operand
> alternative needs a reg, then we'll require a reload.
> b) is also quite obvious. Multiple '?'s accumulate.
> c) is a little more subtle. It bites you when alternatives require
> differing numbers of scratch registers. Take for example
> bswapdi2_64bit, which before this patch has three alternatives
> Z->r with 3 scratch regs, (Z is a subset of m)
> r->Z with 2 scratch regs,
> r->r with 3 scratch regs.
> All other things being equal, with reload you could correct this
> disparity by adding a '?' to the r->Z alternative. We might want
> to do that so that Z->r and r->r are the same "distance" apart
> as r->Z is from r->r. With lra it seems that scratch regs are
> weighted differently..
> d) is also tricky, and a trap for anyone optimizing insn selection for
> functions like some in pr53199.c that have just one rtl insn with
> early clobbers. PowerPC generally returns function results in the
> same register as the first argument, so these hit the early
> clobber. Code elsewhere in larger functions probably won't.
> lra penalizes early clobbers differently to reload (a lot less).
>
> So, putting this all together.. Point (d) test implication is covered
> by the additional functions in pr53199.c. Avoiding early clobbers
> where possible is good since it reduces differences between reload and
> lra insn alternative costing. We also generate better code. I
> managed to do this for all the Z->r bswapdi patterns, but stopped
> short of changing r->r as well since at that point everything looked
> OK! Avoiding extra scratch registers for one alternative is also good.
> The op4 r->r scratch wasn't even used, and op4 for Z->r fairly easy to
> do without. Renaming word_high/low to word1/2 was to make it a little
> easier to track lifetime of addr1/2.
>
> Bootstrapped and regression tested powerpc64-linux. Output of
> pr53199.c inspected for sanity with -mcpu=power{6,7} -m{32,64} and
> {-mlra,}. OK to apply?
>
> gcc/
> * config/rs6000/rs6000.md (bswapdi2): Remove one scratch reg.
> Modify Z->r bswapdi splitter to use dest in place of scratch.
> In r->Z and Z->r bswapdi splitter rename word_high, word_low
> to word1, word2 and rearrange logic to suit.
> (bswapdi2_64bit): Remove early clobber on Z->r alternative.
> (bswapdi2_ldbrx): Likewise. Remove '??' on r->r.
> (bswapdi2_32bit): Remove early clobber on Z->r alternative.
> Add one '?' on r->r. Modify Z->r splitter to avoid need for
> early clobber.
> gcc/testsuite/
> * gcc.target/powerpc/pr53199.c: Add extra functions.
Okay.
Thanks, David