This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
- Date: Wed, 16 Dec 2015 14:27:09 +0000
- Subject: Re: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
- Authentication-results: sourceware.org; auth=none
- References: <AM3PR08MB008850D05209827E0246E28683EE0 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <20151216095418 dot GA39374 at arm dot com> <AM3PR08MB008857D4F6905FC03A92D23883EF0 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com>
On Wed, Dec 16, 2015 at 01:05:21PM +0000, Wilco Dijkstra wrote:
> James Greenhalgh wrote:
> > On Tue, Dec 15, 2015 at 10:54:49AM +0000, Wilco Dijkstra wrote:
> > > ping
> > >
> > > > -----Original Message-----
> > > > From: Wilco Dijkstra [mailto:Wilco.Dijkstra@arm.com]
> > > > Sent: 06 November 2015 20:06
> > > > To: 'gcc-patches@gcc.gnu.org'
> > > > Subject: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> > > >
> > > > This patch adds support for the TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> > > > hook. When the cost of GENERAL_REGS and FP_REGS is identical, the register
> > > > allocator always uses ALL_REGS even when it has a much higher cost. The
> > > > hook changes the class to either FP_REGS or GENERAL_REGS depending on the
> > > > mode of the register. This results in better register allocation overall,
> > > > fewer spills and reduced codesize - particularly in SPEC2006 gamess.
> > > >
> > > > GCC regression passes with several minor fixes.
> > > >
> > > > OK for commit?
> > > >
> > > > ChangeLog:
> > > > 2015-11-06 Wilco Dijkstra <wdijkstr@arm.com>
> > > >
> > > > * gcc/config/aarch64/aarch64.c
> > > > (TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): New define.
> > > > (aarch64_ira_change_pseudo_allocno_class): New function.
> > > > * gcc/testsuite/gcc.target/aarch64/cvtf_1.c: Build with -O2.
> > > > * gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > (test_corners_sisd_di): Improve force to SIMD register.
> > > > (test_corners_sisd_si): Likewise.
> > > > * gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c: Build with -O2.
> > > > * gcc/testsuite/gcc.target/aarch64/vect-ld1r-compile-fp.c:
> > > > Remove scan-assembler check for ldr.
> >
> > Drop the gcc/ from the ChangeLog.
> >
> > > > --
> > > > gcc/config/aarch64/aarch64.c | 22 ++++++++++++++++++++++
> > > > gcc/testsuite/gcc.target/aarch64/cvtf_1.c | 2 +-
> > > > gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c | 4 ++--
> > > > gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c | 2 +-
> > > > .../gcc.target/aarch64/vect-ld1r-compile-fp.c | 1 -
> >
> > These testsuite changes concern me a bit, and you don't mention them beyond
> > saying they are minor fixes...
>
> Well any changes to register allocator preferencing would cause fallout in
> tests that are assuming which register is allocated, especially if they use
> nasty inline assembler hacks to do so...
Sure, but the testcases here each operate on data that should live in
FP_REGS given the initial conditions that the nasty hacks try to mimic -
that's what makes the regressions notable.
>
> > > > #define FCVTDEF(ftype,itype) \
> > > > void \
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > index 363f554..8465c89 100644
> > > > --- a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > @@ -186,9 +186,9 @@ test_corners_sisd_di (Int64x1 b)
> > > > {
> > > > force_simd_di (b);
> > > > b = b >> 63;
> > > > + force_simd_di (b);
> > > > b = b >> 0;
> > > > b += b >> 65; /* { dg-warning "right shift count >= width of type" } */
> > > > - force_simd_di (b);
> >
> > This one I don't understand, but seems to say that we've decided to move
> > b out of FP_REGS after getting it in there for b = b << 63; ? So this is
> > another register allocator regression?
>
> No, basically the register allocator is now making better decisions as to
> where to allocate integer variables. It will only allocate them to FP
> registers if they are primarily used by other FP operations. The
> force_simd_di inline assembler tries to mimic FP uses, and if there are
> enough of them at the right places then everything works as expected. If
> however you do 3 consecutive integer operations then the allocator will now
> correctly prefer to allocate them to the integer registers (while previously
> it wouldn't, which is inefficient).
I'm not sure I understand this argument in the abstract (though I believe
it for some of the supported cores for the AArch64 target). At an abstract
level, given a set of operations which can execute in either FP_REGS or
GENERAL_REGS and initial and post conditions that allocate all input and
output registers from those operations to FP_REGS, I would expect those
operations to take place using FP_REGS? Your patch seems to break this
expectation?
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
> > > > index a49db3e..c5a9c52 100644
> > > > --- a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
> > > > @@ -1,6 +1,6 @@
> > > > /* Test vdup_lane intrinsics work correctly. */
> > > > /* { dg-do run } */
> > > > -/* { dg-options "-O1 --save-temps" } */
> > > > +/* { dg-options "-O2 --save-temps" } */
> >
> > Another -O1 regression ?
>
> No, it's triggering a bug in the -O1 register preferencing that causes incorrect preferences to be
> selected despite the costs being right. The cost calculation with -O1 for eg.
> wrap_vdupb_lane_s8_0() in vdup_lane_2.c:
>
> Pass 0 for finding pseudo/allocno costs
>
> r79: preferred FP_REGS, alternative GENERAL_REGS, allocno GENERAL_REGS
> a1 (r79,l0) best GENERAL_REGS, allocno GENERAL_REGS
> r78: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
> a0 (r78,l0) best GENERAL_REGS, allocno GENERAL_REGS
>
> a0(r78,l0) costs: CALLER_SAVE_REGS:5000,5000 GENERAL_REGS:5000,5000 FP_LO_REGS:5000,5000 FP_REGS:5000,5000 ALL_REGS:10000,10000 MEM:9000,9000
> a1(r79,l0) costs: CALLER_SAVE_REGS:5000,5000 GENERAL_REGS:5000,5000 FP_LO_REGS:0,0 FP_REGS:0,0 ALL_REGS:10000,10000 MEM:9000,9000
>
> So it correctly prefers FP_REGS for r79 as it has the lowest cost, but then
> forces the allocno and best register to GENERAL_REGS... We could work around
> it by not having the "r" variant first in the aarch64_get_lane patterns and
> further discouraging its use via "?r", but that's a different patch.
Well, that patch (moving "r" alternative away from first) does seem to
better fit with what we've done elsewhere in aarch64-simd.md (e.g.
aarch64_combinez below). Does making this change obviate the need to
change these testcases to -O1? If so, I'd rather break them with your patch
and fix it in a follow-up than paper over the cracks.
Thanks,
James