On Thu, May 08, 2008 at 05:24:21PM -0400, Michael Meissner wrote:
This is patch is a successor to the patches in this thread:
http://gcc.gnu.org/ml/gcc-patches/2008-04/msg01387.html
I reworked the patches, so that the rs6000 and spu generate the same vector now
as before, using Paolo Bonzini's patches as a starting point.
I reworked all callers to optab_for_tree_code to pass an additional argument.
Presumably in the future, this could be used for other similar extensions
without having to grow the tree codes.
In doing my final testing on the SSE5 simulator, I noticed that the 4.3
compiler was generating the wrong code for signed widening 32x32->64 bit
integer multiplies. It was using an instruction that does unsigned multiplies
instead of signed multiplies (SSE5 has such an instruction, but SSE2 does
not). I will add a test for this shortly to the testsuite.
I bootstraped the compiler on x86_64 and noticed no regressions. In addition,
I built the SPU and RS6000 ports as cross compilers, and tested the vector code
on both platforms (using -malitvec in the case of the rs6000).
I have included the patches as 5 attachments:
Attachment #1 is the machine independent code
Attachment #2 is the 386 specific code
Attachment #3 is the rs6000 specific code
Attachment #4 is the spu specific code
Attachment #5 is the new SSE5 tests.
Is this ok to install?
[gcc changes]
2008-05-12 Michael Meissner <michael.meissner@amd.com>
Dwarakanath Rajagopal <dwarak.rajagopal@amd.com>
* optabs.h (optab_index): Add OTI_vashl, OTI_vlshr, OTI_vashr,
OTI_vrotl, OTI_vrotr to support vector/vector shifts.
(vashl_optab): New optab for vector/vector shifts.
(vashr_optab): Ditto.
(vlshr_optab): Ditto.
(vrotl_optab): Ditto.
(vrotr_optab): Ditto.
(optab_subtype): New enum for optab_for_tree_code call.
(optab_for_tree_code): Add enum optab_subtype argument.
* optabs.c (optab_for_tree_code): Take an additional argument to
distinguish between a vector shift by a scalar and vector shift by
a vector. Make lshr/ashr/ashl/rotl/rotr optabs just vector
shifted by a scalar. Use vlshr/vashr/vashl/vrotl/vrotr for the
vector shift by a vector.
(expand_widen_pattern_expr): Pass additional argument to
optab_for_tree_code.
* genopinit.c (optabs): Add vashr_optab, vashl_optab, vlshr_optab,
vrotl_optab, vrotr_optab.
* expr.c (expand_expr_real_1): Update calls to
optab_for_tree_code to distinguish between vector shifted by a
scalar and vector shifted by a vector.
* tree-vectorizer.c (supportable_widening_operation): Ditto.
(supportable_narrowing_operation): Ditto.
* tree-vect-analyze.c (vect_build_slp_tree): Ditto.
* tree-vect-patterns.c (vect_pattern_recog_1): Ditto.
* tree-vect-transform.c (vect_model_reduction_cost): Ditto.
(vect_create_epilog_for_reduction): Ditto.
(vectorizable_reduction): Ditto.
(vectorizable_operation): Ditto.
(vect_strided_store_supported): Ditto.
(vect_strided_load_supported): Ditto.
* tree-vect-generic.c (expand_vector_operations_1): Ditto.
* expmed.c (expand_shift): Ditto.
* doc/md.texi (ashl@var{m}3): Document that operand 2 is always a
scalar type.
(ashr@var{m}3): Ditto.
(vashl@var{m}3): Document new vector/vector shift standard name.
(vashr@var{m}3): Ditto.
(vlshr@var{m}3): Ditto.
(vrotl@var{m}3): Ditto.
(vrotr@var{m}3): Ditto.
* config/i386/i386.md (PPERM_SRC): Move PPERM masks here from
i386.c.
(PPERM_INVERT): Ditto.
(PPERM_REVERSE): Ditto.
(PPERM_REV_INV): Ditto.
(PPERM_ZERO): Ditto.
(PPERM_ONES): Ditto.
(PPERM_SIGN): Ditto.
(PPERM_INV_SIGN): Ditto.
(PPERM_SRC1): Ditto.
(PPERM_SRC2): Ditto.
* config/i386/sse.md (mulv2di3): Add SSE5 support.
(sse5_pmacsdql_mem): New SSE5 define_and_split that temporarily
allows a memory operand to be the value being added, and split it
to improve vectorization.
(sse5_pmacsdqh_mem): Ditto.
(sse5_mulv2div2di3_low): SSE5 32-bit multiply and extend function.
(sse5_mulv2div2di3_high): Ditto.
(vec_pack_trunc_v8hi): Add SSE5 pperm support.
(vec_pack_trunc_v4si): Ditto.
(vec_pack_trunc_v2di): Ditto.
(sse5_pcmov_<mode>): Remove code that tried to use use
andps/andnps instead of pcmov.
* config/i386/i386.c (PPERM_SRC): Move PPERM masks to i386.md.
(PPERM_INVERT): Ditto.
(PPERM_REVERSE): Ditto.
(PPERM_REV_INV): Ditto.
(PPERM_ZERO): Ditto.
(PPERM_ONES): Ditto.
(PPERM_SIGN): Ditto.
(PPERM_INV_SIGN): Ditto.
(PPERM_SRC1): Ditto.
(PPERM_SRC2): Ditto.
(ix86_expand_sse_movcc): Move the SSE5 test after the if
true/false tests.
(ix86_expand_int_vcond): If SSE5 generate all possible integer
comparisons.
(ix86_sse5_valid_op_p): Allow num_memory to be negative, which
says ignore whether the last reference is a memory operand.
PR target/36224
* config/i386/sse.md (vec_widen_smult_hi_v4si): Disable this code
unless we have SSE5. If we have SSE5, use the pmacsdql and
pmacsdqh instructions.
(vec_widen_smult_lo_v4si): Ditto.