This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, rs6000] Generate lvx and stvx without swaps for aligned vector loads and stores


On Fri, Jan 12, 2018 at 01:55:26PM -0600, Kelvin Nilsen wrote:
> On Power 7 and Power 8 little endian, the code generator has been
> emitting two instructions for each vector load and each vector store. 
> One instruction does a swapping load or store, and the second
> instruction does an in-register swap.
> 
> This patch replaces the two-instruction sequences with a single lvx (for
> loads) or stvx (for stores) instruction in the very common case that the
> vector is known to reside at a quad-word aligned address in memory. 
> This patch is most relevant to Power 7 and Power 8 targets because
> Power 9 code generation uses new single-instruction encodings for both
> aligned and unaligned vector loads and stores.
> 
> This patch has been boostrapped and tested without regressions on
> powerpc64le-unknown-linux (P8).  It has also been boostrapped and tested
> on powerpc-linux (P7 and P8, big-endian, with both -m32 and -m64 target
> options).
> 
> One regression was identified during big-endian regression testing:
> 
> > FAIL: gcc.target/powerpc/pr83399.c (internal compiler error)
> > FAIL: gcc.target/powerpc/pr83399.c (test for excess errors)
> 
> The pr83399.c test and the ICE are related to a recently committed patch
> that addresses a problem originally found and reported as part of the work
> on this lvx/stvx optimization patch.  It appears that the PR83399 patch
> may not have fully addressed the big-endian aspects of the original
> problem report.

So your patch is just exposing a pre-existing problem, in a single testcase.
I am fine with that.  (But let's follow up, etc.)

> 2018-01-10  Kelvin Nilsen  <kelvin@gcc.gnu.org>
> 
> 	* config/rs6000/rs6000-p8swap.c (rs6000_sum_of_two_registers_p):
> 	New function.
> 	(rs6000_quadword_masked_address_p): Likewise.
> 	(quad_aligned_load_p): Likewise.
> 	(quad_aligned_store_p): Likewise.
> 	(const_load_sequence_p): Add comment to describe the outer-most loop.
> 	(mimic_memory_attributes_and_flags): New function.
> 	(rs6000_gen_stvx): Likewise.
> 	(replace_swapped_aligned_store): Likewise.
> 	(rs6000_gen_lvx): Likewise.
> 	(replace_swapped_aligned_load): Likewise.
> 	(replace_swapped_load_constant): Capitalize argument name in
> 	comment describing this function.
> 	(rs6000_analyze_swaps): Add a third pass to search for vector loads
> 	and stores that access quad-word aligned addresses and replace
> 	with stvx or lvx instructions when appropriate.
> 	* config/rs6000/rs6000-protos.h (rs6000_sum_of_two_registers_p):
> 	New function prototype.
> 	(rs6000_quadword_masked_address_p): Likewise.
> 	(rs6000_gen_lvx): Likewise.
> 	(rs6000_gen_stvx): Likewise.
> 	* config/rs6000/vsx.md (*vsx_le_perm_load_<mode>): For modes
> 	VSX_D (V2DF, V2DI), modify this split to select lvx instruction
> 	when memory address is aligned.
> 	(*vsx_le_perm_load_<mode>): For modes VSX_W (V4SF, V4SI), modify
> 	this split to select lvx instruction when memory address is aligned.
> 	(*vsx_le_perm_load_v8hi): Modify this split to select lvx
> 	instruction when memory address is aligned.
> 	(*vsx_le_perm_load_v16qi): Likewise.
> 	(four unnamed splitters): Modify to select the stvx instruction
> 	when memory is aligned.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-01-10  Kelvin Nilsen  <kelvin@gcc.gnu.org>
> 
> 	* gcc.target/powerpc/pr48857.c: Modify dejagnu directives to look
> 	for lvx and stvx instead of lxvd2x and stxvd2x and require
> 	little-endian target.  Add comments.
> 	* gcc.target/powerpc/swaps-p8-28.c: Add functions for more
> 	comprehensive testing.
> 	* gcc.target/powerpc/swaps-p8-29.c: Likewise.
> 	* gcc.target/powerpc/swaps-p8-30.c: Likewise.
> 	* gcc.target/powerpc/swaps-p8-31.c: Likewise.
> 	* gcc.target/powerpc/swaps-p8-32.c: Likewise.
> 	* gcc.target/powerpc/swaps-p8-33.c: Likewise.
> 	* gcc.target/powerpc/swaps-p8-34.c: Likewise.
> 	* gcc.target/powerpc/swaps-p8-35.c: Likewise.
> 	* gcc.target/powerpc/swaps-p8-36.c: Likewise.
> 	* gcc.target/powerpc/swaps-p8-37.c: Likewise.
> 	* gcc.target/powerpc/swaps-p8-38.c: Likewise.
> 	* gcc.target/powerpc/swaps-p8-39.c: Likewise.
> 	* gcc.target/powerpc/swaps-p8-40.c: Likewise.
> 	* gcc.target/powerpc/swaps-p8-41.c: Likewise.
> 	* gcc.target/powerpc/swaps-p8-42.c: Likewise.
> 	* gcc.target/powerpc/swaps-p8-43.c: Likewise.
> 	* gcc.target/powerpc/swaps-p8-44.c: Likewise.
> 	* gcc.target/powerpc/swaps-p8-45.c: Likewise.
> 	* gcc.target/powerpc/vec-extract-2.c: Add comment and remove
> 	scan-assembler-not directives that forbid lvx and xxpermdi.
> 	* gcc.target/powerpc/vec-extract-3.c: Likewise.
> 	* gcc.target/powerpc/vec-extract-5.c: Likewise.
> 	* gcc.target/powerpc/vec-extract-6.c: Likewise.
> 	* gcc.target/powerpc/vec-extract-7.c: Likewise.
> 	* gcc.target/powerpc/vec-extract-8.c: Likewise.
> 	* gcc.target/powerpc/vec-extract-9.c: Likewise.
> 	* gcc.target/powerpc/vsx-vector-6-le.c: Change
> 	scan-assembler-times directives to reflect different numbers of
> 	expected xxlnor, xxlor, xvcmpgtdp, and xxland instructions.
> 
> libcpp/ChangeLog:
> 
> 2018-01-10  Kelvin Nilsen  <kelvin@gcc.gnu.org>
> 
> 	* lex.c (search_line_fast): Remove illegal coercion of an
> 	unaligned pointer value to vector pointer type and replace with
> 	use of __builtin_vec_vsx_ld () built-in function, which operates
> 	on unaligned pointer values.


> +/* Return true iff EXPR represents an address expression that masks off
> +   the low-order 4 bits in the style of an lvx or stvx rtl pattern.  */
> +bool
> +rs6000_quadword_masked_address_p (const_rtx expr)
> +{
> +  if (GET_CODE (expr) == AND)
> +    {
> +      const_rtx operand1 = XEXP (expr, 0);
> +      const_rtx operand2 = XEXP (expr, 1);
> +      if ((REG_P (operand2) || rs6000_sum_of_two_registers_p (operand2))
> +	  && CONST_SCALAR_INT_P (operand1) && INTVAL (operand1) == -16)
> +	return true;
> +    }
> +  return false;
> +}

Is that correct?  The -16 will be the second arg of the AND, the regs the
first, in canonical RTL.

> +	  && MEM_ALIGN (mem) >= 128)? true: false;

We write spaces before ? and :, too.

> +      /* if this is not the definition of the candidate swap register,
> +	 then skip it.  I am interested in a different definition.  */

Capital on "If".

> +rtx
> +rs6000_gen_stvx (enum machine_mode mode, rtx dest_exp, rtx src_exp) {

{ goes on a new line.

> +	/* KFmode, TFmode, other modes not expected in this context.  */
> +	gcc_unreachable ();    

(Whitespace at end of line).

Looks fine, except the nits and the rs6000_quadword_masked_address_p thing,
which looks pretty serious.  Could you look at that, retest with it fixed
(if you agree it is broken as-is)?


Segher


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]