This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][RFC] Fix complex multiplication vectorization on x86_64


Richard Guenther wrote:

This implements some missing vec_interleave and vec_extract_{odd,even}
expanders for x86_64 to make vectorizing complex multiplication
possible.

The problem starts with the testsuite which either can be tuned to
have vect_extract_even_odd or not, but not, as x86_64 requires,
only turn on support for vect_extract_even_odd for SImode or larger
element sizes. Any suggestions how to deal with this?
Defining a keyword (something like) vect_extract_large_types for x86_64
(and all the targets that have support for all the types) won't help?
The tests will have to be changed as well, of course, for example

Uros suggested something similar. So I added vect_extract_even_odd_wide and vect_strided_wide that only cover vector elements of 4 byte size or larger.

The following is what I am now re-testing on x86_64/i686.

Ok for trunk?

Thanks,
Richard.

2008-07-31 Richard Guenther <rguenther@suse.de>

PR target/35252
* config/i386/sse.md (SSEMODE4S, SSEMODE2D): New mode iterators.
(ssedoublesizemode): New mode attribute.
(sse_shufps): Call gen_sse_shufps_v4sf.
(sse_shufps_1): Macroize.
(sse2_shufpd): Call gen_Sse_shufpd_v2df.
(sse2_shufpd_1): Macroize.
(vec_extract_odd, vec_extract_even): New expanders.
(vec_interleave_highv4sf, vec_interleave_lowv4sf,
vec_interleave_highv2df, vec_interleave_lowv2df): Likewise.
* i386.c (ix86_expand_vector_init_one_nonzero): Call
gen_sse_shufps_v4sf instead of gen_sse_shufps_1.
(ix86_expand_vector_set): Likewise.
(ix86_expand_reduc_v4sf): Likewise.

Please look for comments on x86 part in the .md patterns:


	* lib/target-supports.exp (vect_extract_even_odd_wide) Add.
	(vect_strided_wide): Likewise.
	* gcc.dg/vect/fast-math-pr35982.c: Enable for
	vect_extract_even_odd_wide.
	* gcc.dg/vect/fast-math-vect-complex-3.c: Likewise.
	* gcc.dg/vect/vect-1.c: Likewise.
	* gcc.dg/vect/vect-107.c: Likewise.
	* gcc.dg/vect/vect-98.c: Likewise.
	* gcc.dg/vect/vect-strided-float.c: Likewise.
	* gcc.dg/vect/slp-11.c: Enable for vect_strided_wide.
	* gcc.dg/vect/slp-12a.c: Likewise.
	* gcc.dg/vect/slp-12b.c: Likewise.
	* gcc.dg/vect/slp-19.c: Likewise.
	* gcc.dg/vect/slp-23.c: Likewise.
	* gcc.dg/vect/slp-5.c: Likewise.

Index: gcc/config/i386/sse.md
===================================================================
*** gcc/config/i386/sse.md.orig 2008-07-30 17:08:05.000000000 +0200
--- gcc/config/i386/sse.md 2008-07-31 12:25:36.000000000 +0200
***************
*** 36,41 ****
--- 36,45 ----
(define_mode_iterator SSEMODEF4 [SF DF V4SF V2DF])
(define_mode_iterator SSEMODEF2P [V4SF V2DF])
+ ;; Int-float size matches
+ (define_mode_iterator SSEMODE4S [V4SF V4SI])
+ (define_mode_iterator SSEMODE2D [V2DF V2DI])
+ ;; Mapping from float mode to required SSE level
(define_mode_attr sse [(SF "sse") (DF "sse2") (V4SF "sse") (V2DF "sse2")])
***************
*** 57,62 ****
--- 61,70 ----
(V16QI "QI") (V8HI "HI")
(V4SI "SI") (V2DI "DI")])
+ ;; Mapping of vector modes to a vector mode of double size
+ (define_mode_attr ssedoublesizemode [(V2DF "V4DF") (V2DI "V4DI")
+ (V4SF "V8SF") (V4SI "V8SI")])
+ ;; Number of scalar elements in each vector type
(define_mode_attr ssescalarnum [(V4SF "4") (V2DF "2")
(V16QI "16") (V8HI "8")
***************
*** 2129,2135 ****
"TARGET_SSE"
{
int mask = INTVAL (operands[3]);
! emit_insn (gen_sse_shufps_1 (operands[0], operands[1], operands[2],
GEN_INT ((mask >> 0) & 3),
GEN_INT ((mask >> 2) & 3),
GEN_INT (((mask >> 4) & 3) + 4),
--- 2137,2143 ----
"TARGET_SSE"
{
int mask = INTVAL (operands[3]);
! emit_insn (gen_sse_shufps_v4sf (operands[0], operands[1], operands[2],
GEN_INT ((mask >> 0) & 3),
GEN_INT ((mask >> 2) & 3),
GEN_INT (((mask >> 4) & 3) + 4),
***************
*** 2137,2148 ****
DONE;
})
! (define_insn "sse_shufps_1"
! [(set (match_operand:V4SF 0 "register_operand" "=x")
! (vec_select:V4SF
! (vec_concat:V8SF
! (match_operand:V4SF 1 "register_operand" "0")
! (match_operand:V4SF 2 "nonimmediate_operand" "xm"))
(parallel [(match_operand 3 "const_0_to_3_operand" "")
(match_operand 4 "const_0_to_3_operand" "")
(match_operand 5 "const_4_to_7_operand" "")
--- 2145,2156 ----
DONE;
})
! (define_insn "sse_shufps_<mode>"
! [(set (match_operand:SSEMODE4S 0 "register_operand" "=x")
! (vec_select:SSEMODE4S
! (vec_concat:<ssedoublesizemode>
! (match_operand:SSEMODE4S 1 "register_operand" "0")
! (match_operand:SSEMODE4S 2 "nonimmediate_operand" "xm"))
(parallel [(match_operand 3 "const_0_to_3_operand" "")
(match_operand 4 "const_0_to_3_operand" "")
(match_operand 5 "const_4_to_7_operand" "")
***************
*** 2540,2557 ****
"TARGET_SSE2"
{
int mask = INTVAL (operands[3]);
! emit_insn (gen_sse2_shufpd_1 (operands[0], operands[1], operands[2],
GEN_INT (mask & 1),
GEN_INT (mask & 2 ? 3 : 2)));
DONE;
})
! (define_insn "sse2_shufpd_1"
! [(set (match_operand:V2DF 0 "register_operand" "=x")
! (vec_select:V2DF
! (vec_concat:V4DF
! (match_operand:V2DF 1 "register_operand" "0")
! (match_operand:V2DF 2 "nonimmediate_operand" "xm"))
(parallel [(match_operand 3 "const_0_to_1_operand" "")
(match_operand 4 "const_2_to_3_operand" "")])))]
"TARGET_SSE2"
--- 2548,2611 ----
"TARGET_SSE2"
{
int mask = INTVAL (operands[3]);
! emit_insn (gen_sse2_shufpd_v2df (operands[0], operands[1], operands[2],
GEN_INT (mask & 1),
GEN_INT (mask & 2 ? 3 : 2)));
DONE;
})
! (define_expand "vec_extract_even<mode>"
! [(match_operand:SSEMODE4S 0 "register_operand" "")
! (match_operand:SSEMODE4S 1 "register_operand" "")
! (match_operand:SSEMODE4S 2 "nonimmediate_operand" "")]
! "TARGET_SSE"
! {
! emit_insn (gen_sse_shufps_<mode> (operands[0], operands[1], operands[2],
! GEN_INT (0), GEN_INT (2),
! GEN_INT (4), GEN_INT (6)));
! DONE;
! })

Please write the expander above without preparation statements, as a pattern similar to the sse_shufps_<mode> itself. These are much more informative, and since you simply copy operands/const ints around without any special processing, IMO it doesn't warrant calling emit_insn directly from preparation statement.


! ! (define_expand "vec_extract_odd<mode>"
! [(match_operand:SSEMODE4S 0 "register_operand" "")
! (match_operand:SSEMODE4S 1 "register_operand" "")
! (match_operand:SSEMODE4S 2 "nonimmediate_operand" "")]
! "TARGET_SSE"
! {
! emit_insn (gen_sse_shufps_<mode> (operands[0], operands[1], operands[2],
! GEN_INT (1), GEN_INT (3),
! GEN_INT (5), GEN_INT (7)));
! DONE;
! })
! ! (define_expand "vec_extract_even<mode>"
! [(match_operand:SSEMODE2D 0 "register_operand" "")
! (match_operand:SSEMODE2D 1 "register_operand" "")
! (match_operand:SSEMODE2D 2 "nonimmediate_operand" "")]
! "TARGET_SSE2"
! {
! emit_insn (gen_sse2_shufpd_<mode> (operands[0], operands[1], operands[2],
! GEN_INT (0), GEN_INT (2)));
! DONE;
! })
! ! (define_expand "vec_extract_odd<mode>"
! [(match_operand:SSEMODE2D 0 "register_operand" "")
! (match_operand:SSEMODE2D 1 "register_operand" "")
! (match_operand:SSEMODE2D 2 "nonimmediate_operand" "")]
! "TARGET_SSE2"
! {
! emit_insn (gen_sse2_shufpd_<mode> (operands[0], operands[1], operands[2],
! GEN_INT (1), GEN_INT (3)));
! DONE;
! })
! ! (define_insn "sse2_shufpd_<mode>"
! [(set (match_operand:SSEMODE2D 0 "register_operand" "=x")
! (vec_select:SSEMODE2D
! (vec_concat:<ssedoublesizemode>
! (match_operand:SSEMODE2D 1 "register_operand" "0")
! (match_operand:SSEMODE2D 2 "nonimmediate_operand" "xm"))
(parallel [(match_operand 3 "const_0_to_1_operand" "")
(match_operand 4 "const_2_to_3_operand" "")])))]
"TARGET_SSE2"
***************
*** 4195,4200 ****
--- 4249,4310 ----
DONE;
})
+ (define_expand "vec_interleave_highv4sf"
+ [(set (match_operand:V4SF 0 "register_operand" "")
+ (vec_select:V4SF
+ (vec_concat:V8SF
+ (match_operand:V4SF 1 "register_operand" "")
+ (match_operand:V4SF 2 "nonimmediate_operand" ""))
+ (parallel [(const_int 1)
+ (const_int 3)])))]
+ "TARGET_SSE"
+ {
+ emit_insn (gen_sse_unpckhps (operands[0], operands[1], operands[2]));
+ DONE;
+ })

The pattern above has wrong selector. Also, please don't use preparation statements to generate insn.


+ + (define_expand "vec_interleave_lowv4sf"
+ [(set (match_operand:V4SF 0 "register_operand" "")
+ (vec_select:V4SF
+ (vec_concat:V8SF
+ (match_operand:V4SF 1 "register_operand" "")
+ (match_operand:V4SF 2 "nonimmediate_operand" ""))
+ (parallel [(const_int 1)
+ (const_int 3)])))]
+ "TARGET_SSE"
+ {
+ emit_insn (gen_sse_unpcklps (operands[0], operands[1], operands[2]));
+ DONE;
+ })

Please note wrong selector in above patern too. Just copy sse_unpcklps pattern to here...


+ + (define_expand "vec_interleave_highv2df"
+ [(set (match_operand:V2DF 0 "register_operand" "")
+ (vec_select:V2DF
+ (vec_concat:V4DF
+ (match_operand:V2DF 1 "register_operand" "")
+ (match_operand:V2DF 2 "nonimmediate_operand" ""))
+ (parallel [(const_int 1)
+ (const_int 3)])))]
+ "TARGET_SSE2"
+ {
+ emit_insn (gen_sse2_unpckhpd (operands[0], operands[1], operands[2]));
+ DONE;
+ })
+ + (define_expand "vec_interleave_lowv2df"
+ [(set (match_operand:V2DF 0 "register_operand" "")
+ (vec_select:V2DF
+ (vec_concat:V4DF
+ (match_operand:V2DF 1 "register_operand" "")
+ (match_operand:V2DF 2 "nonimmediate_operand" ""))
+ (parallel [(const_int 0)
+ (const_int 2)])))]
+ "TARGET_SSE2"
+ {
+ emit_insn (gen_sse2_unpcklpd (operands[0], operands[1], operands[2]));
+ DONE;
+ })

Please also remove preparation statements from patterns above.


IMO, the testsuite part is OK, but I can't approve that part.

Thanks,
Uros.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]