This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH][RFC] Fix complex multiplication vectorization on x86_64

From: Uros Bizjak <ubizjak at gmail dot com>
To: Richard Guenther <rguenther at suse dot de>
Cc: Ira Rosen <IRAR at il dot ibm dot com>, gcc-patches at gcc dot gnu dot org
Date: Fri, 01 Aug 2008 00:30:12 +0200
Subject: Re: [PATCH][RFC] Fix complex multiplication vectorization on x86_64
References: <OFB7C38163.27A29077-ONC2257497.001EA5E0-C2257497.00213961@il.ibm.com> <alpine.LNX.1.10.0807311315200.4328@zhemvz.fhfr.qr>

Richard Guenther wrote:

This implements some missing vec_interleave and vec_extract_{odd,even}
expanders for x86_64 to make vectorizing complex multiplication
possible.
The problem starts with the testsuite which either can be tuned to have vect_extract_even_odd or not, but not, as x86_64 requires, only turn on support for vect_extract_even_odd for SImode or larger element sizes. Any suggestions how to deal with this?
Defining a keyword (something like) vect_extract_large_types for x86_64 (and all the targets that have support for all the types) won't help? The tests will have to be changed as well, of course, for example
Uros suggested something similar.  So I added vect_extract_even_odd_wide
and vect_strided_wide that only cover vector elements of 4 byte size
or larger.
The following is what I am now re-testing on x86_64/i686.

Ok for trunk?
Thanks,
Richard.
2008-07-31 Richard Guenther <rguenther@suse.de>

PR target/35252 * config/i386/sse.md (SSEMODE4S, SSEMODE2D): New mode iterators. (ssedoublesizemode): New mode attribute. (sse_shufps): Call gen_sse_shufps_v4sf. (sse_shufps_1): Macroize. (sse2_shufpd): Call gen_Sse_shufpd_v2df. (sse2_shufpd_1): Macroize. (vec_extract_odd, vec_extract_even): New expanders. (vec_interleave_highv4sf, vec_interleave_lowv4sf, vec_interleave_highv2df, vec_interleave_lowv2df): Likewise. * i386.c (ix86_expand_vector_init_one_nonzero): Call gen_sse_shufps_v4sf instead of gen_sse_shufps_1. (ix86_expand_vector_set): Likewise. (ix86_expand_reduc_v4sf): Likewise.

Please look for comments on x86 part in the .md patterns:

	* lib/target-supports.exp (vect_extract_even_odd_wide) Add.
	(vect_strided_wide): Likewise.
	* gcc.dg/vect/fast-math-pr35982.c: Enable for
	vect_extract_even_odd_wide.
	* gcc.dg/vect/fast-math-vect-complex-3.c: Likewise.
	* gcc.dg/vect/vect-1.c: Likewise.
	* gcc.dg/vect/vect-107.c: Likewise.
	* gcc.dg/vect/vect-98.c: Likewise.
	* gcc.dg/vect/vect-strided-float.c: Likewise.
	* gcc.dg/vect/slp-11.c: Enable for vect_strided_wide.
	* gcc.dg/vect/slp-12a.c: Likewise.
	* gcc.dg/vect/slp-12b.c: Likewise.
	* gcc.dg/vect/slp-19.c: Likewise.
	* gcc.dg/vect/slp-23.c: Likewise.
	* gcc.dg/vect/slp-5.c: Likewise.
Index: gcc/config/i386/sse.md =================================================================== *** gcc/config/i386/sse.md.orig 2008-07-30 17:08:05.000000000 +0200 --- gcc/config/i386/sse.md 2008-07-31 12:25:36.000000000 +0200 *************** *** 36,41 **** --- 36,45 ---- (define_mode_iterator SSEMODEF4 [SF DF V4SF V2DF]) (define_mode_iterator SSEMODEF2P [V4SF V2DF]) + ;; Int-float size matches + (define_mode_iterator SSEMODE4S [V4SF V4SI]) + (define_mode_iterator SSEMODE2D [V2DF V2DI]) + ;; Mapping from float mode to required SSE level (define_mode_attr sse [(SF "sse") (DF "sse2") (V4SF "sse") (V2DF "sse2")]) *************** *** 57,62 **** --- 61,70 ---- (V16QI "QI") (V8HI "HI") (V4SI "SI") (V2DI "DI")]) + ;; Mapping of vector modes to a vector mode of double size + (define_mode_attr ssedoublesizemode [(V2DF "V4DF") (V2DI "V4DI") + (V4SF "V8SF") (V4SI "V8SI")]) + ;; Number of scalar elements in each vector type (define_mode_attr ssescalarnum [(V4SF "4") (V2DF "2") (V16QI "16") (V8HI "8") *************** *** 2129,2135 **** "TARGET_SSE" { int mask = INTVAL (operands[3]); ! emit_insn (gen_sse_shufps_1 (operands[0], operands[1], operands[2], GEN_INT ((mask >> 0) & 3), GEN_INT ((mask >> 2) & 3), GEN_INT (((mask >> 4) & 3) + 4), --- 2137,2143 ---- "TARGET_SSE" { int mask = INTVAL (operands[3]); ! emit_insn (gen_sse_shufps_v4sf (operands[0], operands[1], operands[2], GEN_INT ((mask >> 0) & 3), GEN_INT ((mask >> 2) & 3), GEN_INT (((mask >> 4) & 3) + 4), *************** *** 2137,2148 **** DONE; }) ! (define_insn "sse_shufps_1" ! [(set (match_operand:V4SF 0 "register_operand" "=x") ! (vec_select:V4SF ! (vec_concat:V8SF ! (match_operand:V4SF 1 "register_operand" "0") ! (match_operand:V4SF 2 "nonimmediate_operand" "xm")) (parallel [(match_operand 3 "const_0_to_3_operand" "") (match_operand 4 "const_0_to_3_operand" "") (match_operand 5 "const_4_to_7_operand" "") --- 2145,2156 ---- DONE; }) ! (define_insn "sse_shufps_<mode>" ! [(set (match_operand:SSEMODE4S 0 "register_operand" "=x") ! (vec_select:SSEMODE4S ! (vec_concat:<ssedoublesizemode> ! (match_operand:SSEMODE4S 1 "register_operand" "0") ! (match_operand:SSEMODE4S 2 "nonimmediate_operand" "xm")) (parallel [(match_operand 3 "const_0_to_3_operand" "") (match_operand 4 "const_0_to_3_operand" "") (match_operand 5 "const_4_to_7_operand" "") *************** *** 2540,2557 **** "TARGET_SSE2" { int mask = INTVAL (operands[3]); ! emit_insn (gen_sse2_shufpd_1 (operands[0], operands[1], operands[2], GEN_INT (mask & 1), GEN_INT (mask & 2 ? 3 : 2))); DONE; }) ! (define_insn "sse2_shufpd_1" ! [(set (match_operand:V2DF 0 "register_operand" "=x") ! (vec_select:V2DF ! (vec_concat:V4DF ! (match_operand:V2DF 1 "register_operand" "0") ! (match_operand:V2DF 2 "nonimmediate_operand" "xm")) (parallel [(match_operand 3 "const_0_to_1_operand" "") (match_operand 4 "const_2_to_3_operand" "")])))] "TARGET_SSE2" --- 2548,2611 ---- "TARGET_SSE2" { int mask = INTVAL (operands[3]); ! emit_insn (gen_sse2_shufpd_v2df (operands[0], operands[1], operands[2], GEN_INT (mask & 1), GEN_INT (mask & 2 ? 3 : 2))); DONE; }) ! (define_expand "vec_extract_even<mode>" ! [(match_operand:SSEMODE4S 0 "register_operand" "") ! (match_operand:SSEMODE4S 1 "register_operand" "") ! (match_operand:SSEMODE4S 2 "nonimmediate_operand" "")] ! "TARGET_SSE" ! { ! emit_insn (gen_sse_shufps_<mode> (operands[0], operands[1], operands[2], ! GEN_INT (0), GEN_INT (2), ! GEN_INT (4), GEN_INT (6))); ! DONE; ! })

Please write the expander above without preparation statements, as a pattern similar to the sse_shufps_<mode> itself. These are much more informative, and since you simply copy operands/const ints around without any special processing, IMO it doesn't warrant calling emit_insn directly from preparation statement.

! ! (define_expand "vec_extract_odd<mode>" ! [(match_operand:SSEMODE4S 0 "register_operand" "") ! (match_operand:SSEMODE4S 1 "register_operand" "") ! (match_operand:SSEMODE4S 2 "nonimmediate_operand" "")] ! "TARGET_SSE" ! { ! emit_insn (gen_sse_shufps_<mode> (operands[0], operands[1], operands[2], ! GEN_INT (1), GEN_INT (3), ! GEN_INT (5), GEN_INT (7))); ! DONE; ! }) ! ! (define_expand "vec_extract_even<mode>" ! [(match_operand:SSEMODE2D 0 "register_operand" "") ! (match_operand:SSEMODE2D 1 "register_operand" "") ! (match_operand:SSEMODE2D 2 "nonimmediate_operand" "")] ! "TARGET_SSE2" ! { ! emit_insn (gen_sse2_shufpd_<mode> (operands[0], operands[1], operands[2], ! GEN_INT (0), GEN_INT (2))); ! DONE; ! }) ! ! (define_expand "vec_extract_odd<mode>" ! [(match_operand:SSEMODE2D 0 "register_operand" "") ! (match_operand:SSEMODE2D 1 "register_operand" "") ! (match_operand:SSEMODE2D 2 "nonimmediate_operand" "")] ! "TARGET_SSE2" ! { ! emit_insn (gen_sse2_shufpd_<mode> (operands[0], operands[1], operands[2], ! GEN_INT (1), GEN_INT (3))); ! DONE; ! }) ! ! (define_insn "sse2_shufpd_<mode>" ! [(set (match_operand:SSEMODE2D 0 "register_operand" "=x") ! (vec_select:SSEMODE2D ! (vec_concat:<ssedoublesizemode> ! (match_operand:SSEMODE2D 1 "register_operand" "0") ! (match_operand:SSEMODE2D 2 "nonimmediate_operand" "xm")) (parallel [(match_operand 3 "const_0_to_1_operand" "") (match_operand 4 "const_2_to_3_operand" "")])))] "TARGET_SSE2" *************** *** 4195,4200 **** --- 4249,4310 ---- DONE; }) + (define_expand "vec_interleave_highv4sf" + [(set (match_operand:V4SF 0 "register_operand" "") + (vec_select:V4SF + (vec_concat:V8SF + (match_operand:V4SF 1 "register_operand" "") + (match_operand:V4SF 2 "nonimmediate_operand" "")) + (parallel [(const_int 1) + (const_int 3)])))] + "TARGET_SSE" + { + emit_insn (gen_sse_unpckhps (operands[0], operands[1], operands[2])); + DONE; + })

The pattern above has wrong selector. Also, please don't use preparation statements to generate insn.

+ + (define_expand "vec_interleave_lowv4sf" + [(set (match_operand:V4SF 0 "register_operand" "") + (vec_select:V4SF + (vec_concat:V8SF + (match_operand:V4SF 1 "register_operand" "") + (match_operand:V4SF 2 "nonimmediate_operand" "")) + (parallel [(const_int 1) + (const_int 3)])))] + "TARGET_SSE" + { + emit_insn (gen_sse_unpcklps (operands[0], operands[1], operands[2])); + DONE; + })

Please note wrong selector in above patern too. Just copy sse_unpcklps pattern to here...

+ + (define_expand "vec_interleave_highv2df" + [(set (match_operand:V2DF 0 "register_operand" "") + (vec_select:V2DF + (vec_concat:V4DF + (match_operand:V2DF 1 "register_operand" "") + (match_operand:V2DF 2 "nonimmediate_operand" "")) + (parallel [(const_int 1) + (const_int 3)])))] + "TARGET_SSE2" + { + emit_insn (gen_sse2_unpckhpd (operands[0], operands[1], operands[2])); + DONE; + }) + + (define_expand "vec_interleave_lowv2df" + [(set (match_operand:V2DF 0 "register_operand" "") + (vec_select:V2DF + (vec_concat:V4DF + (match_operand:V2DF 1 "register_operand" "") + (match_operand:V2DF 2 "nonimmediate_operand" "")) + (parallel [(const_int 0) + (const_int 2)])))] + "TARGET_SSE2" + { + emit_insn (gen_sse2_unpcklpd (operands[0], operands[1], operands[2])); + DONE; + })

Please also remove preparation statements from patterns above.

IMO, the testsuite part is OK, but I can't approve that part.

Thanks,
Uros.

References:
- Re: [PATCH][RFC] Fix complex multiplication vectorization on x86_64
  - From: Ira Rosen
- Re: [PATCH][RFC] Fix complex multiplication vectorization on x86_64
  - From: Richard Guenther

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]