This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
RE: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: James Greenhalgh <James dot Greenhalgh at arm dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
- Date: Wed, 16 Dec 2015 13:05:21 +0000
- Subject: RE: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
- Authentication-results: sourceware.org; auth=none
- Nodisclaimer: True
- References: <AM3PR08MB008850D05209827E0246E28683EE0 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <20151216095418 dot GA39374 at arm dot com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:23
James Greenhalgh wrote:
> On Tue, Dec 15, 2015 at 10:54:49AM +0000, Wilco Dijkstra wrote:
> > ping
> >
> > > -----Original Message-----
> > > From: Wilco Dijkstra [mailto:Wilco.Dijkstra@arm.com]
> > > Sent: 06 November 2015 20:06
> > > To: 'gcc-patches@gcc.gnu.org'
> > > Subject: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> > >
> > > This patch adds support for the TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> > > hook. When the cost of GENERAL_REGS and FP_REGS is identical, the register
> > > allocator always uses ALL_REGS even when it has a much higher cost. The
> > > hook changes the class to either FP_REGS or GENERAL_REGS depending on the
> > > mode of the register. This results in better register allocation overall,
> > > fewer spills and reduced codesize - particularly in SPEC2006 gamess.
> > >
> > > GCC regression passes with several minor fixes.
> > >
> > > OK for commit?
> > >
> > > ChangeLog:
> > > 2015-11-06 Wilco Dijkstra <wdijkstr@arm.com>
> > >
> > > * gcc/config/aarch64/aarch64.c
> > > (TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): New define.
> > > (aarch64_ira_change_pseudo_allocno_class): New function.
> > > * gcc/testsuite/gcc.target/aarch64/cvtf_1.c: Build with -O2.
> > > * gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > (test_corners_sisd_di): Improve force to SIMD register.
> > > (test_corners_sisd_si): Likewise.
> > > * gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c: Build with -O2.
> > > * gcc/testsuite/gcc.target/aarch64/vect-ld1r-compile-fp.c:
> > > Remove scan-assembler check for ldr.
>
> Drop the gcc/ from the ChangeLog.
>
> > > --
> > > gcc/config/aarch64/aarch64.c | 22 ++++++++++++++++++++++
> > > gcc/testsuite/gcc.target/aarch64/cvtf_1.c | 2 +-
> > > gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c | 4 ++--
> > > gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c | 2 +-
> > > .../gcc.target/aarch64/vect-ld1r-compile-fp.c | 1 -
>
> These testsuite changes concern me a bit, and you don't mention them beyond
> saying they are minor fixes...
Well any changes to register allocator preferencing would cause fallout in tests that
are assuming which register is allocated, especially if they use nasty inline assembler
hacks to do so...
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/cvtf_1.c b/gcc/testsuite/gcc.target/aarch64/cvtf_1.c
> > > index 5f2ff81..96501db 100644
> > > --- a/gcc/testsuite/gcc.target/aarch64/cvtf_1.c
> > > +++ b/gcc/testsuite/gcc.target/aarch64/cvtf_1.c
> > > @@ -1,5 +1,5 @@
> > > /* { dg-do run } */
> > > -/* { dg-options "-save-temps -fno-inline -O1" } */
> > > +/* { dg-options "-save-temps -fno-inline -O2" } */
>
> This one says we have a code-gen regression at -O1 ?
It avoids a regalloc bug - see below.
> > > #define FCVTDEF(ftype,itype) \
> > > void \
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > index 363f554..8465c89 100644
> > > --- a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > +++ b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > @@ -186,9 +186,9 @@ test_corners_sisd_di (Int64x1 b)
> > > {
> > > force_simd_di (b);
> > > b = b >> 63;
> > > + force_simd_di (b);
> > > b = b >> 0;
> > > b += b >> 65; /* { dg-warning "right shift count >= width of type" } */
> > > - force_simd_di (b);
>
> This one I don't understand, but seems to say that we've decided to move
> b out of FP_REGS after getting it in there for b = b << 63; ? So this is
> another register allocator regression?
No, basically the register allocator is now making better decisions as to where to
allocate integer variables. It will only allocate them to FP registers if they are primarily
used by other FP operations. The force_simd_di inline assembler tries to mimic FP uses,
and if there are enough of them at the right places then everything works as expected.
If however you do 3 consecutive integer operations then the allocator will now correctly
prefer to allocate them to the integer registers (while previously it wouldn't, which is
inefficient).
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
> > > index a49db3e..c5a9c52 100644
> > > --- a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
> > > +++ b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
> > > @@ -1,6 +1,6 @@
> > > /* Test vdup_lane intrinsics work correctly. */
> > > /* { dg-do run } */
> > > -/* { dg-options "-O1 --save-temps" } */
> > > +/* { dg-options "-O2 --save-temps" } */
>
> Another -O1 regression ?
No, it's triggering a bug in the -O1 register preferencing that causes incorrect preferences to be
selected despite the costs being right. The cost calculation with -O1 for eg.
wrap_vdupb_lane_s8_0() in vdup_lane_2.c:
Pass 0 for finding pseudo/allocno costs
r79: preferred FP_REGS, alternative GENERAL_REGS, allocno GENERAL_REGS
a1 (r79,l0) best GENERAL_REGS, allocno GENERAL_REGS
r78: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
a0 (r78,l0) best GENERAL_REGS, allocno GENERAL_REGS
a0(r78,l0) costs: CALLER_SAVE_REGS:5000,5000 GENERAL_REGS:5000,5000 FP_LO_REGS:5000,5000 FP_REGS:5000,5000 ALL_REGS:10000,10000 MEM:9000,9000
a1(r79,l0) costs: CALLER_SAVE_REGS:5000,5000 GENERAL_REGS:5000,5000 FP_LO_REGS:0,0 FP_REGS:0,0 ALL_REGS:10000,10000 MEM:9000,9000
So it correctly prefers FP_REGS for r79 as it has the lowest cost, but then forces the allocno and
best register to GENERAL_REGS... We could work around it by not having the "r" variant first in
the aarch64_get_lane patterns and further discouraging its use via "?r", but that's a different patch.
Wilco