[PATCH] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
Kewen.Lin
linkw@linux.ibm.com
Tue Aug 9 03:01:05 GMT 2022
Hi Xionghu,
Thanks for the fix.
on 2022/8/8 11:42, Xionghu Luo wrote:
> The native RTL expression for vec_mrghw should be same for BE and LE as
> they are register and endian-independent. So both BE and LE need
> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
> with vec_select and vec_concat.
>
> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
> (subreg:V4SI (reg:V16QI 139) 0)
> (subreg:V4SI (reg:V16QI 140) 0))
> [const_int 0 4 1 5]))
>
> Then combine pass could do the nested vec_select optimization
> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:
>
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}
>
> =>
>
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}
>
> The endianness check need only once at ASM generation finally.
> ASM would be better due to nested vec_select simplified to simple scalar
> load.
>
> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
Sorry, no -m32 for LE testing. I noticed the attachement in that PR didn't
include the test case (though the changelog has it), so I re-tested it
again, nothing changed. :)
> Linux(Thanks to Kewen), OK for master? Or should we revert r12-4496 to
> restore to the UNSPEC implementation?
>
I have some concern on those changed "altivec_*_direct", IMHO the suffix
"_direct" is normally to indicate the define_insn is mapped to the
corresponding hw insn directly. With this change, for example,
altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks
misleading. Maybe we can add the corresponding _direct_le and _direct_be
versions, both are mapped into the same insn but have different RTL
patterns. Looking forward to Segher's and David's suggestions.
> gcc/ChangeLog:
> PR target/106069
> * config/rs6000/altivec.md (altivec_vmrghb): Emit same native
> RTL for BE and LE.
> (altivec_vmrghh): Likewise.
> (altivec_vmrghw): Likewise.
> (*altivec_vmrghsf): Adjust.
> (altivec_vmrglb): Likewise.
> (altivec_vmrglh): Likewise.
> (altivec_vmrglw): Likewise.
> (*altivec_vmrglsf): Adjust.
> (altivec_vmrghb_direct): Emit different ASM for BE and LE.
> (altivec_vmrghh_direct): Likewise.
> (altivec_vmrghw_direct_<mode>): Likewise.
> (altivec_vmrglb_direct): Likewise.
> (altivec_vmrglh_direct): Likewise.
> (altivec_vmrglw_direct_<mode>): Likewise.
> (vec_widen_smult_hi_v16qi): Adjust.
> (vec_widen_smult_lo_v16qi): Adjust.
> (vec_widen_umult_hi_v16qi): Adjust.
> (vec_widen_umult_lo_v16qi): Adjust.
> (vec_widen_smult_hi_v8hi): Adjust.
> (vec_widen_smult_lo_v8hi): Adjust.
> (vec_widen_umult_hi_v8hi): Adjust.
> (vec_widen_umult_lo_v8hi): Adjust.
> * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Emit same
> native RTL for BE and LE.
> * config/rs6000/vsx.md (vsx_xxmrghw_<mode>): Likewise.
> (vsx_xxmrglw_<mode>): Likewise.
>
> gcc/testsuite/ChangeLog:
> PR target/106069
> * gcc.target/powerpc/pr106069.C: New test.
>
> Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
> ---
> gcc/config/rs6000/altivec.md | 122 ++++++++++++--------
> gcc/config/rs6000/rs6000.cc | 36 +++---
> gcc/config/rs6000/vsx.md | 16 +--
> gcc/testsuite/gcc.target/powerpc/pr106069.C | 118 +++++++++++++++++++
> 4 files changed, 209 insertions(+), 83 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106069.C
>
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 2c4940f2e21..8d9c0109559 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -1144,11 +1144,7 @@ (define_expand "altivec_vmrghb"
> (use (match_operand:V16QI 2 "register_operand"))]
> "TARGET_ALTIVEC"
> {
> - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
> - : gen_altivec_vmrglb_direct;
> - if (!BYTES_BIG_ENDIAN)
> - std::swap (operands[1], operands[2]);
> - emit_insn (fun (operands[0], operands[1], operands[2]));
> + emit_insn (gen_altivec_vmrghb_direct (operands[0], operands[1], operands[2]));
> DONE;
> })
>
> @@ -1167,7 +1163,12 @@ (define_insn "altivec_vmrghb_direct"
> (const_int 6) (const_int 22)
> (const_int 7) (const_int 23)])))]
> "TARGET_ALTIVEC"
> - "vmrghb %0,%1,%2"
> + {
> + if (BYTES_BIG_ENDIAN)
> + return "vmrghb %0,%1,%2";
> + else
> + return "vmrglb %0,%2,%1";
> + }
> [(set_attr "type" "vecperm")])
>
> (define_expand "altivec_vmrghh"
> @@ -1176,11 +1177,7 @@ (define_expand "altivec_vmrghh"
> (use (match_operand:V8HI 2 "register_operand"))]
> "TARGET_ALTIVEC"
> {
> - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
> - : gen_altivec_vmrglh_direct;
> - if (!BYTES_BIG_ENDIAN)
> - std::swap (operands[1], operands[2]);
> - emit_insn (fun (operands[0], operands[1], operands[2]));
> + emit_insn (gen_altivec_vmrghh_direct (operands[0], operands[1], operands[2]));
> DONE;
> })
>
> @@ -1195,7 +1192,12 @@ (define_insn "altivec_vmrghh_direct"
> (const_int 2) (const_int 10)
> (const_int 3) (const_int 11)])))]
> "TARGET_ALTIVEC"
> - "vmrghh %0,%1,%2"
> + {
> + if (BYTES_BIG_ENDIAN)
> + return "vmrghh %0,%1,%2";
> + else
> + return "vmrglh %0,%2,%1";
> + }
> [(set_attr "type" "vecperm")])
>
> (define_expand "altivec_vmrghw"
> @@ -1204,12 +1206,8 @@ (define_expand "altivec_vmrghw"
> (use (match_operand:V4SI 2 "register_operand"))]
> "VECTOR_MEM_ALTIVEC_P (V4SImode)"
> {
> - rtx (*fun) (rtx, rtx, rtx);
> - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
> - : gen_altivec_vmrglw_direct_v4si;
> - if (!BYTES_BIG_ENDIAN)
> - std::swap (operands[1], operands[2]);
> - emit_insn (fun (operands[0], operands[1], operands[2]));
> + emit_insn (
> + gen_altivec_vmrghw_direct_v4si (operands[0], operands[1], operands[2]));
> DONE;
> })
>
[snip]
> [(set_attr "type" "vecperm")])
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106069.C b/gcc/testsuite/gcc.target/powerpc/pr106069.C
> new file mode 100644
> index 00000000000..56219a74692
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106069.C
Since this is a C++ test case, it should be placed in gcc/testsuite/g++.target/powerpc/.
> @@ -0,0 +1,118 @@
> +/* { dg-do run } */
This case requires altivec, it needs something like:
/* { dg-require-effective-target vmx_hw } */
/* { dg-options "-maltivec" } */
BR,
Kewen
> +
> +extern "C" void *
> +memcpy (void *, const void *, unsigned long);
> +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
> +
> +union
> +{
> + native_simd_type V;
> + int R[4];
> +} store_le_vec;
> +
> +struct S
> +{
> + S () = default;
> + S (unsigned B0)
> + {
> + native_simd_type val{B0};
> + m_simd = val;
> + }
> + void store_le (unsigned int out[])
> + {
> + store_le_vec.V = m_simd;
> + unsigned int x0 = store_le_vec.R[0];
> + memcpy (out, &x0, 1);
> + }
> + S rotl (unsigned int r)
> + {
> + native_simd_type rot{r};
> + return __builtin_vec_rl (m_simd, rot);
> + }
> + void operator+= (S other)
> + {
> + m_simd = __builtin_vec_add (m_simd, other.m_simd);
> + }
> + void operator^= (S other)
> + {
> + m_simd = __builtin_vec_xor (m_simd, other.m_simd);
> + }
> + static void transpose (S &B0, S B1, S B2, S B3)
> + {
> + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
> + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
> + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
> + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
> + B0 = __builtin_vec_mergeh (T0, T1);
> + B3 = __builtin_vec_mergel (T2, T3);
> + }
> + S (native_simd_type x) : m_simd (x) {}
> + native_simd_type m_simd;
> +};
> +
> +void
> +foo (unsigned int output[], unsigned state[])
> +{
> + S R00 = state[0];
> + S R01 = state[0];
> + S R02 = state[2];
> + S R03 = state[0];
> + S R05 = state[5];
> + S R06 = state[6];
> + S R07 = state[7];
> + S R08 = state[8];
> + S R09 = state[9];
> + S R10 = state[10];
> + S R11 = state[11];
> + S R12 = state[12];
> + S R13 = state[13];
> + S R14 = state[4];
> + S R15 = state[15];
> + for (int r = 0; r != 10; ++r)
> + {
> + R09 += R13;
> + R11 += R15;
> + R05 ^= R09;
> + R06 ^= R10;
> + R07 ^= R11;
> + R07 = R07.rotl (7);
> + R00 += R05;
> + R01 += R06;
> + R02 += R07;
> + R15 ^= R00;
> + R12 ^= R01;
> + R13 ^= R02;
> + R00 += R05;
> + R01 += R06;
> + R02 += R07;
> + R15 ^= R00;
> + R12 = R12.rotl (8);
> + R13 = R13.rotl (8);
> + R10 += R15;
> + R11 += R12;
> + R08 += R13;
> + R09 += R14;
> + R05 ^= R10;
> + R06 ^= R11;
> + R07 ^= R08;
> + R05 = R05.rotl (7);
> + R06 = R06.rotl (7);
> + R07 = R07.rotl (7);
> + }
> + R00 += state[0];
> + S::transpose (R00, R01, R02, R03);
> + R00.store_le (output);
> +}
> +
> +unsigned int res[1];
> +unsigned main_state[]{1634760805, 60878, 2036477234, 6,
> + 0, 825562964, 1471091955, 1346092787,
> + 506976774, 4197066702, 518848283, 118491664,
> + 0, 0, 0, 0};
> +int
> +main ()
> +{
> + foo (res, main_state);
> + if (res[0] != 0x41fcef98)
> + __builtin_abort ();
> +}
More information about the Gcc-patches
mailing list