[PATCH] Fix PR 78243 on PowerPC
Michael Meissner
meissner@linux.vnet.ibm.com
Fri Nov 11 00:21:00 GMT 2016
Aaron Sawdey has been running the GCC testsuite on the power9 simulator and he
noticed that:
gcc.c-torture/execute/pr68532.c
does not run, and opened bug 78243 for this failure.
Now, if you compile pr68532 with normal options (-O2/-O3 and -mcpu=power8 or
-mcpu=power9) it works because with the normal powerpc vectorization costs, the
vectorize only generates scalar code. However, the options in the PR
explicitly turns on -fno-vect-cost-model, which forces the loop to be be
vectorized. In doing so, it generates pretty bad code.
The vectorizer generates a vector add loop, and at the end it does a vector
reduction to get the total added. When -mcpu=power9 is used it generates a
VEXTRACTUH instruction to extract the HImode from the V8HImode vector.
Unfortunately, on little endian (with little endian element ordering), it gets
the wrong element.
This patch fixes that problem. I did a bootstrap and regression test, and
there were no regressions. I ran the test on the power9 simulator for both big
endian and little endian options and it passed. I also ran the following
executable tests from the testsuite which exercise vector init, set, and
extract for each of the basic types:
vec-init-1.c element type: int
vec-init-2.c element type: long
vec-init-4.c element type: short
vec-init-5.c element type: signed char
vec-init-8.c element type: float
vec-init-9.c element type: double
All tests passed on both little endian and big endian simulator runs. Can I
check this patch into the trunk?
2016-11-10 Michael Meissner <meissner@linux.vnet.ibm.com>
PR target/78243
* config/rs6000/vsx.md (vsx_extract_<mode>_p9): Correct the
element order for little endian ordering.
* config/rs6000/altivec.md (reduc_plus_scal_<mode>): Use
VECTOR_ELT_ORDER_BIG and not BYTES_BIG_ENDIAN to adjust element
number.
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
-------------- next part --------------
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md (revision 242048)
+++ gcc/config/rs6000/vsx.md (revision 242049)
@@ -2542,10 +2542,13 @@ (define_insn "vsx_extract_<mode>_p9"
"VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_VEXTRACTUB
&& TARGET_VSX_SMALL_INTEGER"
{
- /* Note, the element number has already been adjusted for endianness, so we
- don't have to adjust it here. */
- int unit_size = GET_MODE_UNIT_SIZE (<MODE>mode);
- HOST_WIDE_INT offset = unit_size * INTVAL (operands[2]);
+ HOST_WIDE_INT elt = INTVAL (operands[2]);
+ HOST_WIDE_INT elt_adj = ((!VECTOR_ELT_ORDER_BIG)
+ ? (GET_MODE_NUNITS (<MODE>mode) - 1 - elt)
+ : elt);
+
+ HOST_WIDE_INT unit_size = GET_MODE_UNIT_SIZE (<MODE>mode);
+ HOST_WIDE_INT offset = unit_size * elt_adj;
operands[2] = GEN_INT (offset);
if (unit_size == 4)
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md (revision 242048)
+++ gcc/config/rs6000/altivec.md (revision 242049)
@@ -2785,7 +2785,7 @@ (define_expand "reduc_plus_scal_<mode>"
rtx vtmp1 = gen_reg_rtx (V4SImode);
rtx vtmp2 = gen_reg_rtx (<MODE>mode);
rtx dest = gen_lowpart (V4SImode, vtmp2);
- int elt = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (<MODE>mode) - 1 : 0;
+ int elt = VECTOR_ELT_ORDER_BIG ? GET_MODE_NUNITS (<MODE>mode) - 1 : 0;
emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
emit_insn (gen_altivec_vsum4s<VI_char>s (vtmp1, operands[1], vzero));
More information about the Gcc-patches
mailing list