[PATCH] Fix MMX/SSE/AVX* shifts by non-immediate scalar (PR target/80286)

Jakub Jelinek jakub@redhat.com
Tue Apr 4 12:01:00 GMT 2017


On Tue, Apr 04, 2017 at 08:39:59AM +0200, Uros Bizjak wrote:
> > Any thoughts on what to do to generate reasonable code when the shift count
> > comes from memory (e.g. as int variable) or is in the low bits of some XMM
> > regioster?
> 
> The problem with int variable from memory is, that shifts access full
> 128bits for their count operand, so this is effectively a no-go. If
> there is a 128bit count value in memory, we can maybe define shift
> pattern with:
> 
> (subreg:DI (match_operand:V2DI 2 "general_operand" "xmN,vmN"))
> 
> ?

Well, if the original memory is say int, then we can't just read it as V2DI
or V4SI.

> > First of all, perhaps we could have some combiner (or peephole) pattern that would
> > transform sign-extend from e.g. SI to DI on the shift count into zero-extend
> > if there are no other uses of the extension result - if the shift count is
> > negative in SImode (or even QImode), then it is already large number and the
> > upper 32 bits or more don't really change anything on that.
> 
> We can introduce shift patterns with embedded extensions, and split
> them to zext + shift. These new patterns can be easily macroized with
> any_extend code iterator and SWI124 mode iterator, so we avoid pattern
> explosion.

I assume split those before reload.  Because we want to give reload a chance
to do the zero extension on GPRs if it is more beneficial, and it might
choose to store it into memory and load into XMM from memory and that is
hard to do after reload.

> > Then perhaps we could emit pmovzxdq for SSE4.1+ instead of going through
> > GPRs and back, or for SSE2 pxor on a scratch reg and punpck* to get it zero
> > extended.  Not sure if we want to add =v / vm alternative to
> > zero_extendsidi2*, it already has some x but with ?s that prevent the RA
> > from using it.  So thoughts on that?
> 
> The ? is there to discourage RA from allocating xmm reg (all these
> alternatives have * on xmm reg), in effect instructing RA to prefer
> GPRs. If the value is already in xmm reg, then I expect ? alternative
> will be used. So, yes, v/v alternative as you proposed would be a good
> addition to zero_extendsidi alternatives. Please note though that
> pmovzxdq operates on a vector value, so memory operands should be
> avoided.

With ? in front of it or without?  I admit I've only tried so far:
@@ -4049,24 +4049,29 @@ (define_expand "extendsidi2"
 })

 (define_insn "*extendsidi2_rex64"
-  [(set (match_operand:DI 0 "register_operand" "=*a,r")
-       (sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "*0,rm")))]
+  [(set (match_operand:DI 0 "register_operand" "=*a,r,v")
+       (sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "*0,rm,vm")))]
   "TARGET_64BIT"
   "@
    {cltq|cdqe}
-   movs{lq|x}\t{%1, %0|%0, %1}"
-  [(set_attr "type" "imovx")
-   (set_attr "mode" "DI")
-   (set_attr "prefix_0f" "0")
-   (set_attr "modrm" "0,1")])
+   movs{lq|x}\t{%1, %0|%0, %1}
+   %vpmovsxdq\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,*,sse4")
+   (set_attr "type" "imovx,imovx,ssemov")
+   (set_attr "mode" "DI,DI,TI")
+   (set_attr "prefix_0f" "0,0,*")
+   (set_attr "prefix_extra" "*,*,1")
+   (set_attr "prefix" "orig,orig,maybe_evex")
+   (set_attr "modrm" "0,1,*")])


and with the ? in front of v it for some reason didn't trigger.
I'll try the zero_extendsidi2 now and see how it works.

> OK for trunk and backports.

Committed to trunk so far, backports in a week or so when I backport
dozens of other patches together with it.

	Jakub



More information about the Gcc-patches mailing list