This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Add MULT_HIGHPART_EXPR
On Thu, Jun 28, 2012 at 08:57:23AM -0700, Richard Henderson wrote:
> On 2012-06-28 07:05, Jakub Jelinek wrote:
> > Unfortunately the addition of the builtin_mul_widen_* hooks on i?86 seems
> > to pessimize the generated code for gcc.dg/vect/pr51581-3.c
> > testcase (at least with -O3 -mavx) compared to when the hooks aren't
> > present, because i?86 has more natural support for widen mult lo/hi
> > compoared to widen mult even/odd, but I assume that on powerpc it is the
> > other way around. So, how should I find out if both VEC_WIDEN_MULT_*_EXPR
> > and builtin_mul_widen_* are possible for the particular vectype which one
> > will be cheaper?
>
> I would assume that if the builtin exists, then it is cheaper.
>
> I disagree about "x86 has more natural support for hi/lo". The basic sse2
> multiplication is even. One shift per input is needed to generate odd.
> On the other hand, one interleave per input is required for both hi/lo.
> So 4 setup insns for hi/lo, and 2 setup insns for even/odd. And on top of
> all that, XOP includes multiply odd at least for signed V4SI.
Perhaps the problem is then that the permutation is much more expensive
for even/odd. With even/odd the f2 routine is:
vmovdqa d(%rip), %xmm2
vmovdqa .LC1(%rip), %xmm0
vpsrlq $32, %xmm2, %xmm4
vmovdqa d+16(%rip), %xmm1
vpmuludq %xmm0, %xmm2, %xmm5
vpsrlq $32, %xmm0, %xmm3
vpmuludq %xmm3, %xmm4, %xmm4
vpmuludq %xmm0, %xmm1, %xmm0
vmovdqa .LC2(%rip), %xmm2
vpsrlq $32, %xmm1, %xmm1
vpmuludq %xmm3, %xmm1, %xmm3
vmovdqa .LC3(%rip), %xmm1
vpshufb %xmm2, %xmm5, %xmm5
vpshufb %xmm1, %xmm4, %xmm4
vpshufb %xmm2, %xmm0, %xmm2
vpshufb %xmm1, %xmm3, %xmm1
vpor %xmm4, %xmm5, %xmm4
vpor %xmm1, %xmm2, %xmm1
vpsrld $1, %xmm4, %xmm4
vmovdqa %xmm4, c(%rip)
vpsrld $1, %xmm1, %xmm1
vmovdqa %xmm1, c+16(%rip)
ret
and with lo/hi it is:
vmovdqa d(%rip), %xmm2
vpunpckhdq %xmm2, %xmm2, %xmm3
vpunpckldq %xmm2, %xmm2, %xmm2
vmovdqa .LC1(%rip), %xmm0
vpmuludq %xmm0, %xmm3, %xmm3
vmovdqa d+16(%rip), %xmm1
vpmuludq %xmm0, %xmm2, %xmm2
vshufps $221, %xmm2, %xmm3, %xmm2
vpsrld $1, %xmm2, %xmm2
vmovdqa %xmm2, c(%rip)
vpunpckhdq %xmm1, %xmm1, %xmm2
vpunpckldq %xmm1, %xmm1, %xmm1
vpmuludq %xmm0, %xmm2, %xmm2
vpmuludq %xmm0, %xmm1, %xmm0
vshufps $221, %xmm0, %xmm2, %xmm0
vpsrld $1, %xmm0, %xmm0
vmovdqa %xmm0, c+16(%rip)
ret
Jakub