This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH: Add SSE4.2 support - pcmpstr part


On Thu, May 31, 2007 at 11:49:46AM +0200, Uros Bizjak wrote:
> On 5/30/07, H. J. Lu <hjl@lucon.org> wrote:
> 
> >Here is the updated patch. I added OPTION_MASK_ISA_XXX_UNSET so
> >that we only need to change one macro when we add a new ISA.  Tested
> >on Linux/Intel64.
> 
> I think we need 3 patterns for pcmp* insns:
> 
> a) This one when index is required:
> 
> (define_insn "sse4_2_pcmpistri"
>  [(set (reg  ...???... )
> 	(unspec:SI
> 	  [(match_operand:V16QI 0 "register_operand" "x")
> 	   (reg:SI 0)
> 	   (match_operand:V16QI 1 "nonimmediate_operand" "xm")
> 	   (reg:SI 1)
> 	   (match_operand:SI 2 "const_0_to_255_operand" "n")]
> 	  UNSPEC_PCMPESTR))
>   (clobber (reg:CC FLAGS_REG))]
> 
> b) Similar when we need mask:
> 
> (define_insn "sse4_2_pcmpestrm"
>  [(set (reg ...???... )
> 	(unspec:V16QI
> 	  [(match_operand:V16QI 0 "register_operand" "x")
> 	   (reg:SI 0)
> 	   (match_operand:V16QI 1 "nonimmediate_operand" "xm")
> 	   (reg:SI 1)
> 	   (match_operand:SI 2 "const_0_to_255_operand" "n")]
> 	  UNSPEC_PCMPESTR))
>  (clobber (reg:CC FLAGS_REG))]
> 
> c) CC setting insn, CConly (only "mask" one is shown here...)
> 
>  [(set (reg:CC FLAGS_REG)
> 	(unspec:CC
> 	  [(match_operand:V16QI 0 "register_operand" "x")
> 	   (reg:SI 0)
> 	   (match_operand:V16QI 1 "nonimmediate_operand" "xm")
> 	   (reg:SI 1)
> 	   (match_operand:SI 2 "const_0_to_255_operand" "n")]
> 	  UNSPEC_PCMPESTR))
>   (clobber (match_scratch ...???... ))]
> 
> d) IMO we also need CConly + reg setting ins, so combine can combine
> two succesive instructions into one (thus doing automatically the part
> you process "manually"):
> 
>  [(set (reg ...???... )
> 	(unspec:V16QI
> 	  [(match_operand:V16QI 0 "register_operand" "x")
> 	   (reg:SI 0)
> 	   (match_operand:V16QI 1 "nonimmediate_operand" "xm")
> 	   (reg:SI 1)
> 	   (match_operand:SI 2 "const_0_to_255_operand" "n")]
> 	  UNSPEC_PCMPESTR))
>   (set (reg:CC FLAGS_REG)
> 	(unspec:CC
> 	  [(match_dup 0)
> 	   (reg:SI 0)
> 	   (match_dup 1)
> 	   (reg:SI 1)
> 	   (match_dup 2)]
> 	  UNSPEC_PCMPESTR))]
> 
> and
> 
> (define_insn "sse4_2_pcmpestri"
>  [(set (reg ...???... )
> 	(unspec:SI
> 	  [(match_operand:V16QI 0 "register_operand" "x")
> 	   (reg:SI 0)
> 	   (match_operand:V16QI 1 "nonimmediate_operand" "xm")
> 	   (reg:SI 1)
> 	   (match_operand:SI 2 "const_0_to_255_operand" "n")]
> 	  UNSPEC_PCMPESTR))
>   (set (reg:CC FLAGS_REG)
> 	(unspec:CC
> 	  [(match_dup 0)
> 	   (reg:SI 0)
> 	   (match_dup 1)
> 	   (reg:SI 1)
> 	   (match_dup 2)]
> 	  UNSPEC_PCMPESTR))]

I have tried the above first without a new register class.

> 
> Now for the hardest part ... Since ...???... can be either xmm or ecx,
> IMO the best way is to create new register class, so register
> allocator is free to choose either register, whichever fits best
> (hopefully ;).

A new register class may complicate register allocator even more.
I am not sure if it is a good idea given how complex register allocator
is now.

> 
> Also, I see no reason, why we need two new CC modes. CCmode mode is
> enough to use as a generic comparison mode.

The new CC modes are due to __builtin_ia32_pcmp?str?o128, which
checks OF bit, and __builtin_ia32_pcmp?str?s128, which checks SF
bit. Those aren't covered by the normal CC modes. There are no
RTX codes for them. That is I invented 2 modes and reuse UNGT/UNLT.
Besides you can't mix CC modes set by __builtin_ia32_pcmpistr?o128
and __builtin_ia32_pcmpestr?o128 since they work on different things.


H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]