This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Add MULT_HIGHPART_EXPR

From: Jakub Jelinek <jakub at redhat dot com>
To: Richard Henderson <rth at redhat dot com>
Cc: gcc-patches at gcc dot gnu dot org
Date: Thu, 28 Jun 2012 18:20:29 +0200
Subject: Re: [PATCH] Add MULT_HIGHPART_EXPR
References: <1340833028-3712-1-git-send-email-rth@redhat.com> <20120628071755.GP20264@tucnak.redhat.com> <20120628140558.GS20264@tucnak.redhat.com> <4FEC7EE3.4060904@redhat.com>
Reply-to: Jakub Jelinek <jakub at redhat dot com>

On Thu, Jun 28, 2012 at 08:57:23AM -0700, Richard Henderson wrote:
> On 2012-06-28 07:05, Jakub Jelinek wrote:
> > Unfortunately the addition of the builtin_mul_widen_* hooks on i?86 seems
> > to pessimize the generated code for gcc.dg/vect/pr51581-3.c
> > testcase (at least with -O3 -mavx) compared to when the hooks aren't
> > present, because i?86 has more natural support for widen mult lo/hi
> > compoared to widen mult even/odd, but I assume that on powerpc it is the
> > other way around.  So, how should I find out if both VEC_WIDEN_MULT_*_EXPR
> > and builtin_mul_widen_* are possible for the particular vectype which one
> > will be cheaper?
> 
> I would assume that if the builtin exists, then it is cheaper.
> 
> I disagree about "x86 has more natural support for hi/lo".  The basic sse2
> multiplication is even.  One shift per input is needed to generate odd. 
> On the other hand, one interleave per input is required for both hi/lo. 
> So 4 setup insns for hi/lo, and 2 setup insns for even/odd.  And on top of
> all that, XOP includes multiply odd at least for signed V4SI.

Perhaps the problem is then that the permutation is much more expensive
for even/odd.  With even/odd the f2 routine is:
        vmovdqa d(%rip), %xmm2
        vmovdqa .LC1(%rip), %xmm0
        vpsrlq  $32, %xmm2, %xmm4
        vmovdqa d+16(%rip), %xmm1
        vpmuludq        %xmm0, %xmm2, %xmm5
        vpsrlq  $32, %xmm0, %xmm3
        vpmuludq        %xmm3, %xmm4, %xmm4
        vpmuludq        %xmm0, %xmm1, %xmm0
        vmovdqa .LC2(%rip), %xmm2
        vpsrlq  $32, %xmm1, %xmm1
        vpmuludq        %xmm3, %xmm1, %xmm3
        vmovdqa .LC3(%rip), %xmm1
        vpshufb %xmm2, %xmm5, %xmm5
        vpshufb %xmm1, %xmm4, %xmm4
        vpshufb %xmm2, %xmm0, %xmm2
        vpshufb %xmm1, %xmm3, %xmm1
        vpor    %xmm4, %xmm5, %xmm4
        vpor    %xmm1, %xmm2, %xmm1
        vpsrld  $1, %xmm4, %xmm4
        vmovdqa %xmm4, c(%rip)
        vpsrld  $1, %xmm1, %xmm1
        vmovdqa %xmm1, c+16(%rip)
        ret
and with lo/hi it is:
        vmovdqa d(%rip), %xmm2
        vpunpckhdq      %xmm2, %xmm2, %xmm3
        vpunpckldq      %xmm2, %xmm2, %xmm2
        vmovdqa .LC1(%rip), %xmm0
        vpmuludq        %xmm0, %xmm3, %xmm3
        vmovdqa d+16(%rip), %xmm1
        vpmuludq        %xmm0, %xmm2, %xmm2
        vshufps $221, %xmm2, %xmm3, %xmm2
        vpsrld  $1, %xmm2, %xmm2
        vmovdqa %xmm2, c(%rip)
        vpunpckhdq      %xmm1, %xmm1, %xmm2
        vpunpckldq      %xmm1, %xmm1, %xmm1
        vpmuludq        %xmm0, %xmm2, %xmm2
        vpmuludq        %xmm0, %xmm1, %xmm0
        vshufps $221, %xmm0, %xmm2, %xmm0
        vpsrld  $1, %xmm0, %xmm0
        vmovdqa %xmm0, c+16(%rip)
        ret

	Jakub

Follow-Ups:
- Re: [PATCH] Add MULT_HIGHPART_EXPR
  - From: Richard Henderson

References:
- [PATCH] Add MULT_HIGHPART_EXPR
  - From: Richard Henderson
- Re: [PATCH] Add MULT_HIGHPART_EXPR
  - From: Jakub Jelinek
- Re: [PATCH] Add MULT_HIGHPART_EXPR
  - From: Jakub Jelinek
- Re: [PATCH] Add MULT_HIGHPART_EXPR
  - From: Richard Henderson

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]