This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: PATCH: Add SSE4.2 support - pcmpstr part
- From: "Uros Bizjak" <ubizjak at gmail dot com>
- To: "H. J. Lu" <hjl at lucon dot org>
- Cc: "GCC Patches" <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 31 May 2007 11:49:46 +0200
- Subject: Re: PATCH: Add SSE4.2 support - pcmpstr part
On 5/30/07, H. J. Lu <hjl@lucon.org> wrote:
Here is the updated patch. I added OPTION_MASK_ISA_XXX_UNSET so
that we only need to change one macro when we add a new ISA. Tested
on Linux/Intel64.
I think we need 3 patterns for pcmp* insns:
a) This one when index is required:
(define_insn "sse4_2_pcmpistri"
[(set (reg ...???... )
(unspec:SI
[(match_operand:V16QI 0 "register_operand" "x")
(reg:SI 0)
(match_operand:V16QI 1 "nonimmediate_operand" "xm")
(reg:SI 1)
(match_operand:SI 2 "const_0_to_255_operand" "n")]
UNSPEC_PCMPESTR))
(clobber (reg:CC FLAGS_REG))]
b) Similar when we need mask:
(define_insn "sse4_2_pcmpestrm"
[(set (reg ...???... )
(unspec:V16QI
[(match_operand:V16QI 0 "register_operand" "x")
(reg:SI 0)
(match_operand:V16QI 1 "nonimmediate_operand" "xm")
(reg:SI 1)
(match_operand:SI 2 "const_0_to_255_operand" "n")]
UNSPEC_PCMPESTR))
(clobber (reg:CC FLAGS_REG))]
c) CC setting insn, CConly (only "mask" one is shown here...)
[(set (reg:CC FLAGS_REG)
(unspec:CC
[(match_operand:V16QI 0 "register_operand" "x")
(reg:SI 0)
(match_operand:V16QI 1 "nonimmediate_operand" "xm")
(reg:SI 1)
(match_operand:SI 2 "const_0_to_255_operand" "n")]
UNSPEC_PCMPESTR))
(clobber (match_scratch ...???... ))]
d) IMO we also need CConly + reg setting ins, so combine can combine
two succesive instructions into one (thus doing automatically the part
you process "manually"):
[(set (reg ...???... )
(unspec:V16QI
[(match_operand:V16QI 0 "register_operand" "x")
(reg:SI 0)
(match_operand:V16QI 1 "nonimmediate_operand" "xm")
(reg:SI 1)
(match_operand:SI 2 "const_0_to_255_operand" "n")]
UNSPEC_PCMPESTR))
(set (reg:CC FLAGS_REG)
(unspec:CC
[(match_dup 0)
(reg:SI 0)
(match_dup 1)
(reg:SI 1)
(match_dup 2)]
UNSPEC_PCMPESTR))]
and
(define_insn "sse4_2_pcmpestri"
[(set (reg ...???... )
(unspec:SI
[(match_operand:V16QI 0 "register_operand" "x")
(reg:SI 0)
(match_operand:V16QI 1 "nonimmediate_operand" "xm")
(reg:SI 1)
(match_operand:SI 2 "const_0_to_255_operand" "n")]
UNSPEC_PCMPESTR))
(set (reg:CC FLAGS_REG)
(unspec:CC
[(match_dup 0)
(reg:SI 0)
(match_dup 1)
(reg:SI 1)
(match_dup 2)]
UNSPEC_PCMPESTR))]
Now for the hardest part ... Since ...???... can be either xmm or ecx,
IMO the best way is to create new register class, so register
allocator is free to choose either register, whichever fits best
(hopefully ;).
Also, I see no reason, why we need two new CC modes. CCmode mode is
enough to use as a generic comparison mode.
Uros,