This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: PATCH: Add SSE4.1 support
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: "H. J. Lu" <hjl at lucon dot org>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Wed, 18 Apr 2007 21:26:44 +0200
- Subject: Re: PATCH: Add SSE4.1 support
- References: <20070418160052.GA10054@lucon.org>
>
> 1. Some SSE4.1 instructions will take the fixed xmm0 as the 3rd arg.
> Register allocator has to know not to put xmm0 in the 1st/2nd args for
> those instructions. I added 2 register classes, XMM0REG for xmm0 and
> XMMN_REGS for xmm1-15. But I didn't change regclass_map to make xmm0
> as XMM0REG and xmm1-15 as XMMN_REGS. Will it be a problem?
Well, since the xmm0 is completely symmetric to all other xmms from
reload POV, I would expect it does not make difference, but since
reg_class is defined as smallest class, I would preffer to change it.
Or does somethign break?
>
> 2. SSE4.1 has new instructions to extact an SI/DI value from an XMM
> register and put it in an SI/DI register/memory, pextrd/pextrq. SSE4.1
> intrinsic may generate:
>
> (insn:HI 9 8 22 2 (set (reg:SI 0 ax [62])
> (vec_select:SI (reg:V4SI 21 xmm0 [ i ])
> (parallel [
> (const_int 0 [0x0])
> ]))) 1160 {*sse4_1_pextrd} (nil)
> (nil))
>
> and optimizer will turn it into
>
> (insn 28 8 22 2 (set (reg:SI 0 ax [62])
> (reg:SI 21 xmm0)) 40 {*movsi_1} (nil)
> (nil))
We probably don't need named sse4_1_pextr patterns at all. If they are
just new method to encode XMMreg->integer reg move, I would simply keep
them so.
What is the difference from the regular inter unit move? Is the new instruction
faster or something?
>
> But *movsi_1 won't allow move from xmm0 to ax if inter-unit moves are
> disabled. *movdi_1_rex64 has the same issue. I added pextrd/pextrq
> support to *movsi_1/*movdi_1_rex64. They are enabled when inter-unit
> moves are disabled.
This seems sane.
>
> 3. I introduced new constraints:
>
> a. Y0: For XMM0REG.
> b. Yn: For XMMN_REGS.
> c. Y4: For any SSE register, when SSE4.1 is enabled and
> inter-unit moves are disabled.
>
> to deal those issues.
>
> 4. Also I had to rewrite the umaxv8hi3 pattern in order to generate
> SSE4.1 instruction, pmaxud. As the result, it is no longer available
> for SSE2.
>
> I will submit a patch for the SSE4.1 intrinsic testsuite later.
>
> SSE4.2 support will come later.
I will check the patch shortly,
thanks
Honza