[PATCH, rs6000] Generate lvx and stvx without swaps for aligned vector loads and stores
Segher Boessenkool
segher@kernel.crashing.org
Fri Jan 12 21:22:00 GMT 2018
On Fri, Jan 12, 2018 at 01:55:26PM -0600, Kelvin Nilsen wrote:
> On Power 7 and Power 8 little endian, the code generator has been
> emitting two instructions for each vector load and each vector store.
> One instruction does a swapping load or store, and the second
> instruction does an in-register swap.
>
> This patch replaces the two-instruction sequences with a single lvx (for
> loads) or stvx (for stores) instruction in the very common case that the
> vector is known to reside at a quad-word aligned address in memory.
> This patch is most relevant to Power 7 and Power 8 targets because
> Power 9 code generation uses new single-instruction encodings for both
> aligned and unaligned vector loads and stores.
>
> This patch has been boostrapped and tested without regressions on
> powerpc64le-unknown-linux (P8). It has also been boostrapped and tested
> on powerpc-linux (P7 and P8, big-endian, with both -m32 and -m64 target
> options).
>
> One regression was identified during big-endian regression testing:
>
> > FAIL: gcc.target/powerpc/pr83399.c (internal compiler error)
> > FAIL: gcc.target/powerpc/pr83399.c (test for excess errors)
>
> The pr83399.c test and the ICE are related to a recently committed patch
> that addresses a problem originally found and reported as part of the work
> on this lvx/stvx optimization patch. It appears that the PR83399 patch
> may not have fully addressed the big-endian aspects of the original
> problem report.
So your patch is just exposing a pre-existing problem, in a single testcase.
I am fine with that. (But let's follow up, etc.)
> 2018-01-10 Kelvin Nilsen <kelvin@gcc.gnu.org>
>
> * config/rs6000/rs6000-p8swap.c (rs6000_sum_of_two_registers_p):
> New function.
> (rs6000_quadword_masked_address_p): Likewise.
> (quad_aligned_load_p): Likewise.
> (quad_aligned_store_p): Likewise.
> (const_load_sequence_p): Add comment to describe the outer-most loop.
> (mimic_memory_attributes_and_flags): New function.
> (rs6000_gen_stvx): Likewise.
> (replace_swapped_aligned_store): Likewise.
> (rs6000_gen_lvx): Likewise.
> (replace_swapped_aligned_load): Likewise.
> (replace_swapped_load_constant): Capitalize argument name in
> comment describing this function.
> (rs6000_analyze_swaps): Add a third pass to search for vector loads
> and stores that access quad-word aligned addresses and replace
> with stvx or lvx instructions when appropriate.
> * config/rs6000/rs6000-protos.h (rs6000_sum_of_two_registers_p):
> New function prototype.
> (rs6000_quadword_masked_address_p): Likewise.
> (rs6000_gen_lvx): Likewise.
> (rs6000_gen_stvx): Likewise.
> * config/rs6000/vsx.md (*vsx_le_perm_load_<mode>): For modes
> VSX_D (V2DF, V2DI), modify this split to select lvx instruction
> when memory address is aligned.
> (*vsx_le_perm_load_<mode>): For modes VSX_W (V4SF, V4SI), modify
> this split to select lvx instruction when memory address is aligned.
> (*vsx_le_perm_load_v8hi): Modify this split to select lvx
> instruction when memory address is aligned.
> (*vsx_le_perm_load_v16qi): Likewise.
> (four unnamed splitters): Modify to select the stvx instruction
> when memory is aligned.
>
> gcc/testsuite/ChangeLog:
>
> 2018-01-10 Kelvin Nilsen <kelvin@gcc.gnu.org>
>
> * gcc.target/powerpc/pr48857.c: Modify dejagnu directives to look
> for lvx and stvx instead of lxvd2x and stxvd2x and require
> little-endian target. Add comments.
> * gcc.target/powerpc/swaps-p8-28.c: Add functions for more
> comprehensive testing.
> * gcc.target/powerpc/swaps-p8-29.c: Likewise.
> * gcc.target/powerpc/swaps-p8-30.c: Likewise.
> * gcc.target/powerpc/swaps-p8-31.c: Likewise.
> * gcc.target/powerpc/swaps-p8-32.c: Likewise.
> * gcc.target/powerpc/swaps-p8-33.c: Likewise.
> * gcc.target/powerpc/swaps-p8-34.c: Likewise.
> * gcc.target/powerpc/swaps-p8-35.c: Likewise.
> * gcc.target/powerpc/swaps-p8-36.c: Likewise.
> * gcc.target/powerpc/swaps-p8-37.c: Likewise.
> * gcc.target/powerpc/swaps-p8-38.c: Likewise.
> * gcc.target/powerpc/swaps-p8-39.c: Likewise.
> * gcc.target/powerpc/swaps-p8-40.c: Likewise.
> * gcc.target/powerpc/swaps-p8-41.c: Likewise.
> * gcc.target/powerpc/swaps-p8-42.c: Likewise.
> * gcc.target/powerpc/swaps-p8-43.c: Likewise.
> * gcc.target/powerpc/swaps-p8-44.c: Likewise.
> * gcc.target/powerpc/swaps-p8-45.c: Likewise.
> * gcc.target/powerpc/vec-extract-2.c: Add comment and remove
> scan-assembler-not directives that forbid lvx and xxpermdi.
> * gcc.target/powerpc/vec-extract-3.c: Likewise.
> * gcc.target/powerpc/vec-extract-5.c: Likewise.
> * gcc.target/powerpc/vec-extract-6.c: Likewise.
> * gcc.target/powerpc/vec-extract-7.c: Likewise.
> * gcc.target/powerpc/vec-extract-8.c: Likewise.
> * gcc.target/powerpc/vec-extract-9.c: Likewise.
> * gcc.target/powerpc/vsx-vector-6-le.c: Change
> scan-assembler-times directives to reflect different numbers of
> expected xxlnor, xxlor, xvcmpgtdp, and xxland instructions.
>
> libcpp/ChangeLog:
>
> 2018-01-10 Kelvin Nilsen <kelvin@gcc.gnu.org>
>
> * lex.c (search_line_fast): Remove illegal coercion of an
> unaligned pointer value to vector pointer type and replace with
> use of __builtin_vec_vsx_ld () built-in function, which operates
> on unaligned pointer values.
> +/* Return true iff EXPR represents an address expression that masks off
> + the low-order 4 bits in the style of an lvx or stvx rtl pattern. */
> +bool
> +rs6000_quadword_masked_address_p (const_rtx expr)
> +{
> + if (GET_CODE (expr) == AND)
> + {
> + const_rtx operand1 = XEXP (expr, 0);
> + const_rtx operand2 = XEXP (expr, 1);
> + if ((REG_P (operand2) || rs6000_sum_of_two_registers_p (operand2))
> + && CONST_SCALAR_INT_P (operand1) && INTVAL (operand1) == -16)
> + return true;
> + }
> + return false;
> +}
Is that correct? The -16 will be the second arg of the AND, the regs the
first, in canonical RTL.
> + && MEM_ALIGN (mem) >= 128)? true: false;
We write spaces before ? and :, too.
> + /* if this is not the definition of the candidate swap register,
> + then skip it. I am interested in a different definition. */
Capital on "If".
> +rtx
> +rs6000_gen_stvx (enum machine_mode mode, rtx dest_exp, rtx src_exp) {
{ goes on a new line.
> + /* KFmode, TFmode, other modes not expected in this context. */
> + gcc_unreachable ();
(Whitespace at end of line).
Looks fine, except the nits and the rs6000_quadword_masked_address_p thing,
which looks pretty serious. Could you look at that, retest with it fixed
(if you agree it is broken as-is)?
Segher
More information about the Gcc-patches
mailing list