This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: "pinskia at gmail dot com" <pinskia at gmail dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Marcus Shawcroft <Marcus dot Shawcroft at arm dot com>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>
- Date: Fri, 16 May 2014 11:30:38 +0100
- Subject: Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.
- Authentication-results: sourceware.org; auth=none
- References: <1395997970-27335-1-git-send-email-james dot greenhalgh at arm dot com> <CBECD840-2CDE-4DFE-B917-5B46B7897A99 at gmail dot com> <20140328144805 dot GA31228 at arm dot com> <8DFDE2CC-2D95-43D1-868A-DA941229D1CD at gmail dot com> <20140328153953 dot GA19721 at arm dot com>
On Fri, Mar 28, 2014 at 03:39:53PM +0000, James Greenhalgh wrote:
> On Fri, Mar 28, 2014 at 03:09:22PM +0000, pinskia@gmail.com wrote:
> > > On Mar 28, 2014, at 7:48 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> > > On Fri, Mar 28, 2014 at 11:11:58AM +0000, pinskia@gmail.com wrote:
> > >>> On Mar 28, 2014, at 2:12 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> > >>> There is no way to perform scalar addition in the vector register file,
> > >>> but with the RTX costs in place we start rewriting (x << 1) to (x + x)
> > >>> on almost all cores. The code which makes this decision has no idea that we
> > >>> will end up doing this (it happens well before reload) and so we end up with
> > >>> very ugly code generation in the case where addition was selected, but
> > >>> we are operating in vector registers.
> > >>>
> > >>> This patch relies on the same gimmick we are already using to allow
> > >>> shifts on 32-bit scalars in the vector register file - Use a vector 32x2
> > >>> operation instead, knowing that we can safely ignore the top bits.
> > >>>
> > >>> This restores some normality to scalar_shift_1.c, however the test
> > >>> that we generate a left shift by one is clearly bogus, so remove that.
> > >>>
> > >>> This patch is pretty ugly, but it does generate superficially better
> > >>> looking code for this testcase.
> > >>>
> > >>> Tested on aarch64-none-elf with no issues.
> > >>>
> > >>> OK for stage 1?
> > >>
> > >> It seems we should also discourage the neon alternatives as there might be
> > >> extra movement between the two register sets which we don't want.
> > >
> > > I see your point, but we've tried to avoid doing that elsewhere in the
> > > AArch64 backend. Our argument has been that strictly speaking, it isn't that
> > > the alternative is expensive, it is the movement between the register sets. We
> > > do model that elsewhere, and the register allocator should already be trying to
> > > avoid unneccesary moves between register classes.
> > >
> >
> > What about on a specific core where that alternative is expensive; that is
> > the vector instructions are worse than the scalar ones. How are we going to
> > handle this case?
>
> Certainly not by discouraging the alternative for all cores. We would need
> a more nuanced approach which could be tuned on a per-core basis. Otherwise
> we are bluntly and inaccurately pessimizing those cases where we can cheaply
> perform the operation in the vector register file (e.g. we are cleaning up
> loose ends after a vector loop, we have spilled to the vector register
> file, etc.). The register preference mechanism feels the wrong place to
> catch this as it does not allow for that degree of per-core felxibility,
> an alternative is simply "disparaged slightly" (?, * in LRA) or
> "disparaged severely" (!).
>
> I would think that we don't want to start polluting the machine description
> trying to hack around this as was done with the ARM backend's
> neon_for_64_bits/avoid_neon_for_64_bits.
>
> How have other targets solved this issue?
Did you have any further thoughts on this? I've pushed the costs patches, so
we will start to see gcc.target/aarch64/scalar_shift_1.c failing without
this or an equivalent patch.
Othereise, *ping*
Thanks,
James
---
gcc/
2014-05-16 James Greenhalgh <james.greenhalgh@arm.com>
* config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in
vector registers.
gcc/testsuite/
2014-05-16 James Greenhalgh <james.greenhalgh@arm.com>
* gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler.
> > > If those mechanisms are broken, we should fix them - in that case fixing
> > > this by discouraging valid alternatives would seem to be gaffer-taping over the
> > > real problem.
> > >
> > > Thanks,
> > > James
> > >
> > >>
> > >> Thanks,
> > >> Andrew
> > >>
> > >>>
> > >>> Thanks,
> > >>> James
> > >>>
> > >>> ---
> > >>> gcc/
> > >>>
> > >>> 2014-03-27 James Greenhalgh <james.greenhalgh@arm.com>
> > >>>
> > >>> * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in
> > >>> vector registers.
> > >>>
> > >>> gcc/testsuite/
> > >>> 2014-03-27 James Greenhalgh <james.greenhalgh@arm.com>
> > >>>
> > >>> * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler.
> > >>> <0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch>
> > >>
> >
>
--------------1.8.3-rc0
Content-Type: text/x-patch; name="0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch"
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 266d7873a5a1b8dbb7f955c3f13cf370920a9c4a..7c5b5a566ebfd907b83b38eed2e214738e7e9bd4 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1068,16 +1068,17 @@ (define_expand "add<mode>3"
(define_insn "*addsi3_aarch64"
[(set
- (match_operand:SI 0 "register_operand" "=rk,rk,rk")
+ (match_operand:SI 0 "register_operand" "=rk,rk,w,rk")
(plus:SI
- (match_operand:SI 1 "register_operand" "%rk,rk,rk")
- (match_operand:SI 2 "aarch64_plus_operand" "I,r,J")))]
+ (match_operand:SI 1 "register_operand" "%rk,rk,w,rk")
+ (match_operand:SI 2 "aarch64_plus_operand" "I,r,w,J")))]
""
"@
add\\t%w0, %w1, %2
add\\t%w0, %w1, %w2
+ add\\t%0.2s, %1.2s, %2.2s
sub\\t%w0, %w1, #%n2"
- [(set_attr "type" "alu_imm,alu_reg,alu_imm")]
+ [(set_attr "type" "alu_imm,alu_reg,neon_add,alu_imm")]
)
;; zero_extend version of above
diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
index 7cb17f8..826bafc 100644
--- a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
@@ -193,7 +193,6 @@ test_corners_sisd_di (Int64x1 b)
return b;
}
/* { dg-final { scan-assembler "sshr\td\[0-9\]+,\ d\[0-9\]+,\ 63" } } */
-/* { dg-final { scan-assembler "shl\td\[0-9\]+,\ d\[0-9\]+,\ 1" } } */
Int32x1
test_corners_sisd_si (Int32x1 b)
@@ -207,7 +206,6 @@ test_corners_sisd_si (Int32x1 b)
return b;
}
/* { dg-final { scan-assembler "sshr\tv\[0-9\]+\.2s,\ v\[0-9\]+\.2s,\ 31" } } */
-/* { dg-final { scan-assembler "shl\tv\[0-9\]+\.2s,\ v\[0-9\]+\.2s,\ 1" } } */
--------------1.8.3-rc0--