This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: SSE5 patches part 2


On Fri, Sep 07, 2007 at 08:43:12AM +0200, Uros Bizjak wrote:
> Hello!
> 
> +    (UNSPEC_SSE5_TRUEFALSE	153)
> +    (UNSPEC_PPERM		154)
> +    (UNSPEC_PERMPS		155)
> +    (UNSPEC_PERMPD		156)
> +    (UNSPEC_PMACSSWW		157)
> +    (UNSPEC_PMACSWW		158)
> +    (UNSPEC_PMACSSWD		159)
> +    (UNSPEC_PMACSWD		160)
> +    (UNSPEC_PMACSSDD		161)
> +    (UNSPEC_PMACSDD		162)
> +    (UNSPEC_PMACSSDQL		163)
> 
> Please do not use unspecs, unless you really can't describe
> instruction with existing RTL codes. The problem with uspecs is, that
> they hide all of the insn details, so various optimization passes
> (combine!) can't do nothing with it.

I plan to do so in the future.  Note however, that the pperm, permps, permpd
instructions can't really be defined by RTL.  The special case of using pperm
to sign/zero extend or pack can be expressable, but you need to load the funny
constant that does the proper permutation.

> There are plenty of examples in sse.md:
> 
> 1. horizontal add: "sse3_haddv4sf3"
> 2. permutations: "sse2_pshufd"
> 3. conversions: "sse_cvtsi2ss", "sse_cvtss2si"
> (you can also define scalar conversions as float and fix patterns in i386.md)
> 4. shifts: "ashl<mode>3", "vec_shl_<mode>"
> 
> Also, there is no need to have UNSPECS for diferent modes. One unspec
> is enough to describe the instruction in all modes. So, if there is no
> other way to describe insn with standart RTL expressions, these two
> should be combined:
> 
> +    (UNSPEC_PERMPS		155)
> +    (UNSPEC_PERMPD		156)
> 
> into
> 
> UNSPEC_PERM, and relevant pattern will have SSEMODEF inputs,

Ok.
 
> as well as all of these (example):
> 
> +    (UNSPEC_PROTB		184)
> +    (UNSPEC_PROTW		185)
> +    (UNSPEC_PROTD		186)
> +    (UNSPEC_PROTQ		187)
> 
> into UNSPEC_PROT, where their input operands would be SSEMODEI. Having
> all input operands V2DI mode is not acceptable.

These functions are only for being called from the intrinsics at the current
time.  The intrinsics require that all integer operations use __m128i (aka
V2DI).  I didn't write the spec, but that's how they are defined.  Yes, I think
each of the intrinsics should have had the proper type, but I would need a time
machine to change that.

One thing that I need to investigate for the 4.4 time frame is the shift
instructions bring up an interesting hole in RTL.  You can logically have two
types of vector shifts:

	<vtype> = <vtype> << <scalar>
	      (or)
	<vtype> = <vtype> << <vtype>

where the shift amount is either a scalar quantity (i.e. shift each field by
the same amount), or a vector quanity (i.e. shift each field by the amount
specified by the corresponding field).  GCC can deal with either type, but I
don't think you can express a machine that has both types in RTL.  I wasn't the
person who discovered this, so he may have been mistaken, or possibily GCC has
added the support since then.  SSE5 adds the shift/rotate by a vector type of
instruction.

 
> + (define_expand "sse5_protd_imm"
> +   [(set (match_operand:V2DI 0 "register_operand" "")
> + 	(rotate:V2DI (match_operand:V2DI 1 "nonimmediate_operand" "")
> + 		     (match_operand:SI 2 "const_0_to_31_operand" "n")))]
> +   "TARGET_SSE5"
> + {
> +   rtx op0 = gen_rtx_SUBREG (V4SImode, operands[0], 0);
> +   rtx op1 = gen_rtx_SUBREG (V4SImode, operands[1], 0);
> +
> +   emit_insn (gen_rotlv4si3 (op0, op1, operands[2]));
> +   DONE;
> + })
> 
> Why new expander that doesn't expand to new instructions? This should
> be implemented in SSE5 header

I will investiage this.

> + (define_insn "sse5_pmacsdqh"
> +   [(set (match_operand:V2DI 0 "register_operand" "=x,x,x")
> + 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,x,m")
> + 		      (match_operand:V2DI 2 "nonimmediate_operand" ",x,x")
> + 		      (match_operand:V2DI 3 "register_operand" "0,0,0")] UNSPEC_PMACSDQH))]
> 
> Ehm? Op2 constraints should be fixed...

Whoops.
 
> As a general rule, please use macros wherever possible. I have a plan
> to reorganize SSE.md  as soon as all big changes (like SSE5 ;) get in.
> 
> --- gcc/config/i386/cpuid.h	2007-09-06 13:29:00.166796000 -0400
> ***************
> *** 51,56 ****
> --- 51,57 ----
>   /* %ecx */
>   #define bit_LAHF_LM	(1 << 0)
>   #define bit_SSE4a	(1 << 6)
> + #define bit_SSE5	(1 << 11)
> 
> For now, please leave SSE5 from driver-i386.c. Instead of passing
> -msse5 to compile flags from the driver, driver should pass correct
> -march= that implements SSE5. It is better to have "-march=whatever"
> instead of '-march=amdfam10 -msse5"

Ok, I can remove it.  As you noticed we haven't put in -march=amdfam<xx>
support in yet and provided the latencies.  That will come in the 4.4 time
frame.

> Regarding the tests, there are three important test in the testsuite:
> gcc.target/i386/sse-[12,13,14].c. Please update these tests to include
> bmmintrin.h instead of ammintrin.h (adding -msse5 instead of -msse4a
> to compile flags). These two tests will check _all_ new code for
> compilation problems in -O0 and  -O2

Ok.

-- 
Michael Meissner, AMD
90 Central Street, MS 83-29, Boxborough, MA, 01719, USA
michael.meissner@amd.com



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]