This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: SSE5 patches part 2
- From: "Michael Meissner" <michael dot meissner at amd dot com>
- To: "Uros Bizjak" <ubizjak at gmail dot com>
- Cc: "GCC Patches" <gcc-patches at gcc dot gnu dot org>, "Michael Meissner" <michael dot meissner at amd dot com>, christophe dot harle at amd dot com, dwarak dot rajagopal at amd dot com
- Date: Fri, 7 Sep 2007 14:12:29 -0400
- Subject: Re: SSE5 patches part 2
- References: <5787cf470709062343g5a898cd1jc4a7c80faacb1626@mail.gmail.com>
On Fri, Sep 07, 2007 at 08:43:12AM +0200, Uros Bizjak wrote:
> Hello!
>
> + (UNSPEC_SSE5_TRUEFALSE 153)
> + (UNSPEC_PPERM 154)
> + (UNSPEC_PERMPS 155)
> + (UNSPEC_PERMPD 156)
> + (UNSPEC_PMACSSWW 157)
> + (UNSPEC_PMACSWW 158)
> + (UNSPEC_PMACSSWD 159)
> + (UNSPEC_PMACSWD 160)
> + (UNSPEC_PMACSSDD 161)
> + (UNSPEC_PMACSDD 162)
> + (UNSPEC_PMACSSDQL 163)
>
> Please do not use unspecs, unless you really can't describe
> instruction with existing RTL codes. The problem with uspecs is, that
> they hide all of the insn details, so various optimization passes
> (combine!) can't do nothing with it.
I plan to do so in the future. Note however, that the pperm, permps, permpd
instructions can't really be defined by RTL. The special case of using pperm
to sign/zero extend or pack can be expressable, but you need to load the funny
constant that does the proper permutation.
> There are plenty of examples in sse.md:
>
> 1. horizontal add: "sse3_haddv4sf3"
> 2. permutations: "sse2_pshufd"
> 3. conversions: "sse_cvtsi2ss", "sse_cvtss2si"
> (you can also define scalar conversions as float and fix patterns in i386.md)
> 4. shifts: "ashl<mode>3", "vec_shl_<mode>"
>
> Also, there is no need to have UNSPECS for diferent modes. One unspec
> is enough to describe the instruction in all modes. So, if there is no
> other way to describe insn with standart RTL expressions, these two
> should be combined:
>
> + (UNSPEC_PERMPS 155)
> + (UNSPEC_PERMPD 156)
>
> into
>
> UNSPEC_PERM, and relevant pattern will have SSEMODEF inputs,
Ok.
> as well as all of these (example):
>
> + (UNSPEC_PROTB 184)
> + (UNSPEC_PROTW 185)
> + (UNSPEC_PROTD 186)
> + (UNSPEC_PROTQ 187)
>
> into UNSPEC_PROT, where their input operands would be SSEMODEI. Having
> all input operands V2DI mode is not acceptable.
These functions are only for being called from the intrinsics at the current
time. The intrinsics require that all integer operations use __m128i (aka
V2DI). I didn't write the spec, but that's how they are defined. Yes, I think
each of the intrinsics should have had the proper type, but I would need a time
machine to change that.
One thing that I need to investigate for the 4.4 time frame is the shift
instructions bring up an interesting hole in RTL. You can logically have two
types of vector shifts:
<vtype> = <vtype> << <scalar>
(or)
<vtype> = <vtype> << <vtype>
where the shift amount is either a scalar quantity (i.e. shift each field by
the same amount), or a vector quanity (i.e. shift each field by the amount
specified by the corresponding field). GCC can deal with either type, but I
don't think you can express a machine that has both types in RTL. I wasn't the
person who discovered this, so he may have been mistaken, or possibily GCC has
added the support since then. SSE5 adds the shift/rotate by a vector type of
instruction.
> + (define_expand "sse5_protd_imm"
> + [(set (match_operand:V2DI 0 "register_operand" "")
> + (rotate:V2DI (match_operand:V2DI 1 "nonimmediate_operand" "")
> + (match_operand:SI 2 "const_0_to_31_operand" "n")))]
> + "TARGET_SSE5"
> + {
> + rtx op0 = gen_rtx_SUBREG (V4SImode, operands[0], 0);
> + rtx op1 = gen_rtx_SUBREG (V4SImode, operands[1], 0);
> +
> + emit_insn (gen_rotlv4si3 (op0, op1, operands[2]));
> + DONE;
> + })
>
> Why new expander that doesn't expand to new instructions? This should
> be implemented in SSE5 header
I will investiage this.
> + (define_insn "sse5_pmacsdqh"
> + [(set (match_operand:V2DI 0 "register_operand" "=x,x,x")
> + (unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,x,m")
> + (match_operand:V2DI 2 "nonimmediate_operand" ",x,x")
> + (match_operand:V2DI 3 "register_operand" "0,0,0")] UNSPEC_PMACSDQH))]
>
> Ehm? Op2 constraints should be fixed...
Whoops.
> As a general rule, please use macros wherever possible. I have a plan
> to reorganize SSE.md as soon as all big changes (like SSE5 ;) get in.
>
> --- gcc/config/i386/cpuid.h 2007-09-06 13:29:00.166796000 -0400
> ***************
> *** 51,56 ****
> --- 51,57 ----
> /* %ecx */
> #define bit_LAHF_LM (1 << 0)
> #define bit_SSE4a (1 << 6)
> + #define bit_SSE5 (1 << 11)
>
> For now, please leave SSE5 from driver-i386.c. Instead of passing
> -msse5 to compile flags from the driver, driver should pass correct
> -march= that implements SSE5. It is better to have "-march=whatever"
> instead of '-march=amdfam10 -msse5"
Ok, I can remove it. As you noticed we haven't put in -march=amdfam<xx>
support in yet and provided the latencies. That will come in the 4.4 time
frame.
> Regarding the tests, there are three important test in the testsuite:
> gcc.target/i386/sse-[12,13,14].c. Please update these tests to include
> bmmintrin.h instead of ammintrin.h (adding -msse5 instead of -msse4a
> to compile flags). These two tests will check _all_ new code for
> compilation problems in -O0 and -O2
Ok.
--
Michael Meissner, AMD
90 Central Street, MS 83-29, Boxborough, MA, 01719, USA
michael.meissner@amd.com