This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[Patch AArch64_be] Fix some vec_concat big-endian confusions
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: gcc-patches at gcc dot gnu dot org
- Cc: marcus dot shawcroft at arm dot com
- Date: Tue, 24 Jun 2014 09:45:28 +0100
- Subject: [Patch AArch64_be] Fix some vec_concat big-endian confusions
- Authentication-results: sourceware.org; auth=none
Hi,
vec_concat ( { a, b }, { c, d }) should give a new vector { a, b, c, d }.
On big-endian aarch64 targets, we have to think carefully about what this
means as we map GCC's view of endian-ness on to ours. GCC (for reasons I have
yet to understand) likes to describe lane-extracts from a vector as
endian-ness dependant bit-field extracts. This cause major headaches, and
means we have to pretend throughout the backend that lane zero is at the
high bits of a vector register.
When we have a machine instruction which zeroes the high bits of a vector
register, and we want to describe it in RTL, the natural little-endian view is
vec_concat ( operand, zeroes ). The reality described above implies that the
correct description on big-endian systems is vec_concat ( zeroes, operand ).
This also affects arm_neon.h intrinsics. When we say vcombine (a, b) we mean
that a should occupy the low 64-bits and b the high 64 bits. We therefore
need to take care to swap the operands to vec_concat when we are targeting
big-endian.
This patch is messy, but it gives an notable improvement in the PASS
rates for an internal testsuite for Neon intrinsics.
Tested on aarch64-none-elf and aarch64_be-none-elf with no issues, but no
improvements either.
OK for trunk?
Thanks,
James
---
gcc/
2014-06-20 James Greenhalgh <james.greenhalgh@arm.com>
* config/aarch64/aarch64-simd.md (move_lo_quad_internal_<mode>): New.
(move_lo_quad_internal_be_<mode>): Likewise.
(move_lo_quad_<mode>): Convert to define_expand.
(aarch64_simd_move_hi_quad_<mode>): Gate on BYTES_BIG_ENDIAN.
(aarch64_simd_move_hi_quad_be_<mode>): New.
(move_hi_quad_<mode>): Use appropriate insn for BYTES_BIG_ENDIAN.
(aarch64_combinez<mode>): Gate on BYTES_BIG_ENDIAN.
(aarch64_combinez_be<mode>): New.
(aarch64_combine<mode>): Convert to define_expand.
(aarch64_combine_internal<mode>): New.
(aarch64_simd_combine<mode>): Remove bogus RTL description.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 6b81d811b70bd157207f7753027309442ec9e8b5..00e2206b200fd32c6df5987d7317687488e8dadd 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -942,14 +942,38 @@ (define_insn "<su><maxmin><mode>3"
[(set_attr "type" "neon_minmax<q>")]
)
-;; Move into low-half clearing high half to 0.
+;; vec_concat gives a new vector with the low elements from operand 1, and
+;; the high elements from operand 2. That is to say, given op1 = { a, b }
+;; op2 = { c, d }, vec_concat (op1, op2) = { a, b, c, d }.
+;; What that means, is that the RTL descriptions of the below patterns
+;; need to change depending on endianness.
+
+;; Move to the low architectural bits of the register.
+;; On little-endian this is { operand, zeroes }
+;; On big-endian this is { zeroes, operand }
-(define_insn "move_lo_quad_<mode>"
+(define_insn "move_lo_quad_internal_<mode>"
[(set (match_operand:VQ 0 "register_operand" "=w,w,w")
(vec_concat:VQ
(match_operand:<VHALF> 1 "register_operand" "w,r,r")
(vec_duplicate:<VHALF> (const_int 0))))]
- "TARGET_SIMD"
+ "TARGET_SIMD && !BYTES_BIG_ENDIAN"
+ "@
+ dup\\t%d0, %1.d[0]
+ fmov\\t%d0, %1
+ dup\\t%d0, %1"
+ [(set_attr "type" "neon_dup<q>,f_mcr,neon_dup<q>")
+ (set_attr "simd" "yes,*,yes")
+ (set_attr "fp" "*,yes,*")
+ (set_attr "length" "4")]
+)
+
+(define_insn "move_lo_quad_internal_be_<mode>"
+ [(set (match_operand:VQ 0 "register_operand" "=w,w,w")
+ (vec_concat:VQ
+ (vec_duplicate:<VHALF> (const_int 0))
+ (match_operand:<VHALF> 1 "register_operand" "w,r,r")))]
+ "TARGET_SIMD && BYTES_BIG_ENDIAN"
"@
dup\\t%d0, %1.d[0]
fmov\\t%d0, %1
@@ -960,7 +984,23 @@ (define_insn "move_lo_quad_<mode>"
(set_attr "length" "4")]
)
-;; Move into high-half.
+(define_expand "move_lo_quad_<mode>"
+ [(match_operand:VQ 0 "register_operand")
+ (match_operand:VQ 1 "register_operand")]
+ "TARGET_SIMD"
+{
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_move_lo_quad_internal_be_<mode> (operands[0], operands[1]));
+ else
+ emit_insn (gen_move_lo_quad_internal_<mode> (operands[0], operands[1]));
+ DONE;
+}
+)
+
+;; Move operand1 to the high architectural bits of the register, keeping
+;; the low architectural bits of operand2.
+;; For little-endian this is { operand2, operand1 }
+;; For big-endian this is { operand1, operand2 }
(define_insn "aarch64_simd_move_hi_quad_<mode>"
[(set (match_operand:VQ 0 "register_operand" "+w,w")
@@ -969,12 +1009,25 @@ (define_insn "aarch64_simd_move_hi_quad_
(match_dup 0)
(match_operand:VQ 2 "vect_par_cnst_lo_half" ""))
(match_operand:<VHALF> 1 "register_operand" "w,r")))]
- "TARGET_SIMD"
+ "TARGET_SIMD && !BYTES_BIG_ENDIAN"
"@
ins\\t%0.d[1], %1.d[0]
ins\\t%0.d[1], %1"
- [(set_attr "type" "neon_ins")
- (set_attr "length" "4")]
+ [(set_attr "type" "neon_ins")]
+)
+
+(define_insn "aarch64_simd_move_hi_quad_be_<mode>"
+ [(set (match_operand:VQ 0 "register_operand" "+w,w")
+ (vec_concat:VQ
+ (match_operand:<VHALF> 1 "register_operand" "w,r")
+ (vec_select:<VHALF>
+ (match_dup 0)
+ (match_operand:VQ 2 "vect_par_cnst_hi_half" ""))))]
+ "TARGET_SIMD && BYTES_BIG_ENDIAN"
+ "@
+ ins\\t%0.d[1], %1.d[0]
+ ins\\t%0.d[1], %1"
+ [(set_attr "type" "neon_ins")]
)
(define_expand "move_hi_quad_<mode>"
@@ -982,9 +1035,13 @@ (define_expand "move_hi_quad_<mode>"
(match_operand:<VHALF> 1 "register_operand" "")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, false);
- emit_insn (gen_aarch64_simd_move_hi_quad_<mode> (operands[0],
- operands[1], p));
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, BYTES_BIG_ENDIAN);
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_aarch64_simd_move_hi_quad_be_<mode> (operands[0],
+ operands[1], p));
+ else
+ emit_insn (gen_aarch64_simd_move_hi_quad_<mode> (operands[0],
+ operands[1], p));
DONE;
})
@@ -2338,12 +2395,44 @@ (define_insn "*aarch64_combinez<mode>"
(vec_concat:<VDBL>
(match_operand:VDIC 1 "register_operand" "w")
(match_operand:VDIC 2 "aarch64_simd_imm_zero" "Dz")))]
- "TARGET_SIMD"
+ "TARGET_SIMD && !BYTES_BIG_ENDIAN"
"mov\\t%0.8b, %1.8b"
[(set_attr "type" "neon_move<q>")]
)
-(define_insn_and_split "aarch64_combine<mode>"
+(define_insn "*aarch64_combinez_be<mode>"
+ [(set (match_operand:<VDBL> 0 "register_operand" "=&w")
+ (vec_concat:<VDBL>
+ (match_operand:VDIC 2 "aarch64_simd_imm_zero" "Dz")
+ (match_operand:VDIC 1 "register_operand" "w")))]
+ "TARGET_SIMD && BYTES_BIG_ENDIAN"
+ "mov\\t%0.8b, %1.8b"
+ [(set_attr "type" "neon_move<q>")]
+)
+
+(define_expand "aarch64_combine<mode>"
+ [(match_operand:<VDBL> 0 "register_operand")
+ (match_operand:VDC 1 "register_operand")
+ (match_operand:VDC 2 "register_operand")]
+ "TARGET_SIMD"
+{
+ rtx op1, op2;
+ if (BYTES_BIG_ENDIAN)
+ {
+ op1 = operands[2];
+ op2 = operands[1];
+ }
+ else
+ {
+ op1 = operands[1];
+ op2 = operands[2];
+ }
+ emit_insn (gen_aarch64_combine_internal<mode> (operands[0], op1, op2));
+ DONE;
+}
+)
+
+(define_insn_and_split "aarch64_combine_internal<mode>"
[(set (match_operand:<VDBL> 0 "register_operand" "=&w")
(vec_concat:<VDBL> (match_operand:VDC 1 "register_operand" "w")
(match_operand:VDC 2 "register_operand" "w")))]
@@ -2352,16 +2441,19 @@ (define_insn_and_split "aarch64_combine<
"&& reload_completed"
[(const_int 0)]
{
- aarch64_split_simd_combine (operands[0], operands[1], operands[2]);
+ if (BYTES_BIG_ENDIAN)
+ aarch64_split_simd_combine (operands[0], operands[2], operands[1]);
+ else
+ aarch64_split_simd_combine (operands[0], operands[1], operands[2]);
DONE;
}
[(set_attr "type" "multiple")]
)
(define_expand "aarch64_simd_combine<mode>"
- [(set (match_operand:<VDBL> 0 "register_operand" "=&w")
- (vec_concat:<VDBL> (match_operand:VDC 1 "register_operand" "w")
- (match_operand:VDC 2 "register_operand" "w")))]
+ [(match_operand:<VDBL> 0 "register_operand")
+ (match_operand:VDC 1 "register_operand")
+ (match_operand:VDC 2 "register_operand")]
"TARGET_SIMD"
{
emit_insn (gen_move_lo_quad_<Vdbl> (operands[0], operands[1]));