This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: LRA vs reload on powerpc: 2 extra FAILs that are actually improvements?
- From: David Edelsohn <dje dot gcc at gmail dot com>
- To: Steven Bosscher <stevenb dot gcc at gmail dot com>, Michael Meissner <meissner at linux dot vnet dot ibm dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Vladimir Makarov <vmakarov at redhat dot com>
- Date: Mon, 4 Nov 2013 09:14:50 -0500
- Subject: Re: LRA vs reload on powerpc: 2 extra FAILs that are actually improvements?
- Authentication-results: sourceware.org; auth=none
- References: <CABu31nPyoqz26Cwmzq2PhFL+oezacA0CYoLy8vTOUhj07=CgRg at mail dot gmail dot com>
Hi, Steven
Thanks for investigating this. This presumably was the reason that
Vlad changed the constraint modifier for that pattern in his patch for
LRA. I don't think that using memory is an improvement, but Mike is
the best person to comment.
Thanks, David
On Sat, Nov 2, 2013 at 6:48 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> Hello,
>
> Today's powerpc64-linux gcc has 2 extra failures with -mlra vs. reload
> (i.e. svn unpatched).
>
> (I'm excluding guality failure differences here because there are too
> many of them that seem to fail at random after minimal changes
> anywhere in the compiler...).
>
> Test results are posted here:
> reload: http://gcc.gnu.org/ml/gcc-testresults/2013-11/msg00128.html
> lra: http://gcc.gnu.org/ml/gcc-testresults/2013-11/msg00129.html
>
> The new failures and total score is as follows (+=lra, -=reload):
> +FAIL: gcc.target/powerpc/pr53199.c scan-assembler-times stwbrx 6
> +FAIL: gcc.target/powerpc/pr58330.c scan-assembler-not stwbrx
>
> === gcc Summary ===
>
> -# of expected passes 97887
> -# of unexpected failures 536
> +# of expected passes 97903
> +# of unexpected failures 538
> # of unexpected successes 38
> # of expected failures 244
> -# of unsupported tests 1910
> +# of unsupported tests 1892
>
>
> The failure of pr53199.c is because of different instruction selection
> for bswap. Test case is reduced to just one function:
>
> /* { dg-options "-O2 -mcpu=power6 -mavoid-indexed-addresses" } */
> long long
> reg_reverse (long long x)
> {
> return __builtin_bswap64 (x);
> }
>
> Reload left vs. LRA right:
> reg_reverse: reg_reverse:
> srdi 8,3,32 | addi 8,1,-16
> rlwinm 7,3,8,0xffffffff | srdi 10,3,32
> rlwinm 9,8,8,0xffffffff | addi 9,8,4
> rlwimi 7,3,24,0,7 | stwbrx 3,0,8
> rlwimi 7,3,24,16,23 | stwbrx 10,0,9
> rlwimi 9,8,24,0,7 | ld 3,-16(1)
> rlwimi 9,8,24,16,23 <
> sldi 7,7,32 <
> or 7,7,9 <
> mr 3,7 <
> blr blr
>
> This same difference is responsible for the failure of pr58330.c which
> also uses __builtin_bswap64().
>
> The difference in RTL for the test case is this (after reload vs. after LRA):
> - 11: {%7:DI=bswap(%3:DI);clobber %8:DI;clobber %9:DI;clobber %10:DI;}
> - 20: %3:DI=%7:DI
> + 20: %8:DI=%1:DI-0x10
> + 21: %8:DI=%8:DI // stupid no-op move
> + 11: {[%8:DI]=bswap(%3:DI);clobber %9:DI;clobber %10:DI;clobber scratch;}
> + 19: %3:DI=[%1:DI-0x10]
>
> So LRA believes going through memory is better than using a register,
> even though obviously there are plenty registers available.
>
> What LRA does:
> Creating newreg=129
> Removing SCRATCH in insn #11 (nop 2)
> Creating newreg=130
> Removing SCRATCH in insn #11 (nop 3)
> Creating newreg=131
> Removing SCRATCH in insn #11 (nop 4)
> // at this point the insn would be a bswapdi2_64bit:
> // 11: {%3:DI=bswap(%3:DI);clobber r129;clobber r130;clobber r131;}
> // cost calculation for the insn alternatives:
> 0 Early clobber: reject++
> 1 Non-pseudo reload: reject+=2
> 1 Spill pseudo in memory: reject+=3
> 2 Scratch win: reject+=2
> 3 Scratch win: reject+=2
> 4 Scratch win: reject+=2
> alt=0,overall=18,losers=1,rld_nregs=0
> 0 Non-pseudo reload: reject+=2
> 0 Spill pseudo in memory: reject+=3
> 0 Non input pseudo reload: reject++
> 2 Scratch win: reject+=2
> 3 Scratch win: reject+=2
> alt=1,overall=16,losers=1,rld_nregs=0
> Staticly defined alt reject+=12
> 0 Early clobber: reject++
> 2 Scratch win: reject+=2
> 3 Scratch win: reject+=2
> 4 Scratch win: reject+=2
> 0 Conflict early clobber reload: reject--
> alt=2,overall=24,losers=1,rld_nregs=0
> Choosing alt 1 in insn 11: (0) Z (1) r (2) &b (3) &r (4)
> X {*bswapdi2_64bit}
> Change to class BASE_REGS for r129
> Change to class GENERAL_REGS for r130
> Creating newreg=132 from oldreg=3, assigning class NO_REGS to r132
> Change to class NO_REGS for r131
> 11: {r132:DI=bswap(%3:DI);clobber r129:DI;clobber r130:DI;clobber r131:DI;}
> REG_UNUSED r131:DI
> REG_UNUSED r130:DI
> REG_UNUSED r129:DI
>
> LRA selects alternative 1 (Z,r,&b,&r,X) which seems to be the right
> choice, from looking at the constraints. Reload selects alternative 2
> which is slightly^2 discouraged: (??&r,r,&r,&r,&r).
>
> Is this an improvement or a regression? If it's an improvement then
> these two test cases should be adjusted :-)
>
> Ciao!
> Steven