This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Update SSE5 vector multiplication, shift, rotate, take 3

From: Uros Bizjak <ubizjak at gmail dot com>
To: Michael Meissner <michael dot meissner at amd dot com>, gcc-patches at gcc dot gnu dot org, dwarak dot rajagopal at amd dot com, christophe dot harle at amd dot com, hongjiu dot lu at intel dot com
Date: Tue, 13 May 2008 19:59:57 +0200
Subject: Re: Update SSE5 vector multiplication, shift, rotate, take 3
References: <20080417185036.GA15776@mmeissner-gold.amd.com> <20080508212421.GA4882@mmeissner-gold.amd.com> <20080512235131.GA32104@mmeissner-gold.amd.com>

Michael Meissner wrote:

On Thu, May 08, 2008 at 05:24:21PM -0400, Michael Meissner wrote:

This is patch is a successor to the patches in this thread: http://gcc.gnu.org/ml/gcc-patches/2008-04/msg01387.html


I reworked the patches, so that the rs6000 and spu generate the same vector now
as before, using Paolo Bonzini's patches as a starting point.

I reworked all callers to optab_for_tree_code to pass an additional argument.
Presumably in the future, this could be used for other similar extensions
without having to grow the tree codes.

In doing my final testing on the SSE5 simulator, I noticed that the 4.3
compiler was generating the wrong code for signed widening 32x32->64 bit
integer multiplies.  It was using an instruction that does unsigned multiplies
instead of signed multiplies (SSE5 has such an instruction, but SSE2 does
not).  I will add a test for this shortly to the testsuite.

I bootstraped the compiler on x86_64 and noticed no regressions.  In addition,
I built the SPU and RS6000 ports as cross compilers, and tested the vector code
on both platforms (using -malitvec in the case of the rs6000).

I have included the patches as 5 attachments:
Attachment #1 is the machine independent code
Attachment #2 is the 386 specific code
Attachment #3 is the rs6000 specific code
Attachment #4 is the spu specific code
Attachment #5 is the new SSE5 tests.

Is this ok to install?

[gcc changes]
2008-05-12  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>

	* optabs.h (optab_index): Add OTI_vashl, OTI_vlshr, OTI_vashr,
	OTI_vrotl, OTI_vrotr to support vector/vector shifts.
	(vashl_optab): New optab for vector/vector shifts.
	(vashr_optab): Ditto.
	(vlshr_optab): Ditto.
	(vrotl_optab): Ditto.
	(vrotr_optab): Ditto.
	(optab_subtype): New enum for optab_for_tree_code call.
	(optab_for_tree_code): Add enum optab_subtype argument.

	* optabs.c (optab_for_tree_code): Take an additional argument to
	distinguish between a vector shift by a scalar and vector shift by
	a vector.  Make lshr/ashr/ashl/rotl/rotr optabs just vector
	shifted by a scalar.  Use vlshr/vashr/vashl/vrotl/vrotr for the
	vector shift by a vector.
	(expand_widen_pattern_expr): Pass additional argument to
	optab_for_tree_code.

	* genopinit.c (optabs): Add vashr_optab, vashl_optab, vlshr_optab,
	vrotl_optab, vrotr_optab.

	* expr.c (expand_expr_real_1): Update calls to
	optab_for_tree_code to distinguish between vector shifted by a
	scalar and vector shifted by a vector.
	* tree-vectorizer.c (supportable_widening_operation): Ditto.
	(supportable_narrowing_operation): Ditto.
	* tree-vect-analyze.c (vect_build_slp_tree): Ditto.
	* tree-vect-patterns.c (vect_pattern_recog_1): Ditto.
	* tree-vect-transform.c (vect_model_reduction_cost): Ditto.
	(vect_create_epilog_for_reduction): Ditto.
	(vectorizable_reduction): Ditto.
	(vectorizable_operation): Ditto.
	(vect_strided_store_supported): Ditto.
	(vect_strided_load_supported): Ditto.
	* tree-vect-generic.c (expand_vector_operations_1): Ditto.
	* expmed.c (expand_shift): Ditto.

	* doc/md.texi (ashl@var{m}3): Document that operand 2 is always a
	scalar type.
	(ashr@var{m}3): Ditto.
	(vashl@var{m}3): Document new vector/vector shift standard name.
	(vashr@var{m}3): Ditto.
	(vlshr@var{m}3): Ditto.
	(vrotl@var{m}3): Ditto.
	(vrotr@var{m}3): Ditto.

	* config/i386/i386.md (PPERM_SRC): Move PPERM masks here from
	i386.c.
	(PPERM_INVERT): Ditto.
	(PPERM_REVERSE): Ditto.
	(PPERM_REV_INV): Ditto.
	(PPERM_ZERO): Ditto.
	(PPERM_ONES): Ditto.
	(PPERM_SIGN): Ditto.
	(PPERM_INV_SIGN): Ditto.
	(PPERM_SRC1): Ditto.
	(PPERM_SRC2): Ditto.

	* config/i386/sse.md (mulv2di3): Add SSE5 support.
	(sse5_pmacsdql_mem): New SSE5 define_and_split that temporarily
	allows a memory operand to be the value being added, and split it
	to improve vectorization.
	(sse5_pmacsdqh_mem): Ditto.
	(sse5_mulv2div2di3_low): SSE5 32-bit multiply and extend function.
	(sse5_mulv2div2di3_high): Ditto.
	(vec_pack_trunc_v8hi): Add SSE5 pperm support.
	(vec_pack_trunc_v4si): Ditto.
	(vec_pack_trunc_v2di): Ditto.
	(sse5_pcmov_<mode>): Remove code that tried to use use
	andps/andnps instead of pcmov.

	* config/i386/i386.c (PPERM_SRC): Move PPERM masks to i386.md.
	(PPERM_INVERT): Ditto.
	(PPERM_REVERSE): Ditto.
	(PPERM_REV_INV): Ditto.
	(PPERM_ZERO): Ditto.
	(PPERM_ONES): Ditto.
	(PPERM_SIGN): Ditto.
	(PPERM_INV_SIGN): Ditto.
	(PPERM_SRC1): Ditto.
	(PPERM_SRC2): Ditto.
	(ix86_expand_sse_movcc): Move the SSE5 test after the if
	true/false tests.
	(ix86_expand_int_vcond): If SSE5 generate all possible integer
	comparisons.
	(ix86_sse5_valid_op_p): Allow num_memory to be negative, which
	says ignore whether the last reference is a memory operand.

PR target/36224 * config/i386/sse.md (vec_widen_smult_hi_v4si): Disable this code unless we have SSE5. If we have SSE5, use the pmacsdql and pmacsdqh instructions. (vec_widen_smult_lo_v4si): Ditto.

Please commit PR target/36224 in separate commit, please also add testcase from PR.

Other x86 parts are also OK, but you should wait for middle-end approval.

--- gcc/testsuite/gcc.target/i386/sse5-rotate1-vector.c.~0~ 2008-05-12 18:58:04.298921000 -0400 +++ gcc/testsuite/gcc.target/i386/sse5-rotate1-vector.c 2008-05-06 14:48:09.953898000 -0400 @@ -0,0 +1,35 @@ +/* Test that the compiler properly optimizes vector rotate instructions vector + into prot on SSE5 systems. */ + +/* { dg-do compile } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -msse5 -ftree-vectorize" } */ + +extern void exit (int); + +typedef long __m128i __attribute__ ((__vector_size__ (16), __may_alias__)); + +#define SIZE 10240 + +union { + __m128i i_align; + unsigned u32[SIZE]; +} a, b, c; + +void +left_rotate32 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + a.u32[i] = (b.u32[i] << ((sizeof (int) * 8) - 4)) | (b.u32[i] >> 4); +} + +int +main ()

Since this is a compile test, we probably don't need main().

Thanks,
Uros.

References:
- Re: Update SSE5 vector multiplication, shift, rotate, take 2
  - From: Michael Meissner
- Re: Update SSE5 vector multiplication, shift, rotate, take 3
  - From: Michael Meissner

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]