Vector shuffling

Richard Henderson rth@redhat.com
Wed Sep 28 15:20:00 GMT 2011


On 09/28/2011 05:59 AM, Artem Shinkarov wrote:
> I don't really understand this. As far as I know, expand_normal
> "converts" tree to rtx. All my computations are happening at the level
> of rtx and force_reg is needed just to bring an rtx expression to the
> register of the correct mode. If I am missing something, could you
> give an example how can I use expand_normal instead of force_reg in
> this particular code.

Sorry, I meant expand_(simple_)binop.

>> Is ssse3_pshufb why you do the wrong thing in the expander for v0 != v1?
> 
> My personal feeling is that it may be the case with v0 != v1, that it
> would be more efficient to perform piecewise shuffling rather than
> bitwise dances around the masks.

Maybe for V2DI and V2DFmode, but probably not otherwise.

We can perform the double-word shuffle in 12 insns; 10 for SSE 4.1.
Example assembly attached.

>> It's certainly possible to handle it, though it takes a few more steps,
>> and might well be more efficient as a libgcc function rather than inline.
> 
> I don't really understand why it could be more efficient. I thought
> that inline gives more chances to the final RTL optimisation.

We'll not be able to optimize this at the rtl level.  There are too many
UNSPEC instructions in the way.  In any case, even if that weren't so we'd
only be able to do useful optimization for a constant permutation.  And
we should have been able to prove that at the gimple level.


r~
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: z.s
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20110928/b0c9d12c/attachment.ksh>


More information about the Gcc-patches mailing list