This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
- From: Richard Biener <rguenther at suse dot de>
- To: Sergey Ostanevich <sergos dot gnu at gmail dot com>
- Cc: Jakub Jelinek <jakub at redhat dot com>, Richard Henderson <rth at redhat dot com>, Yuri Rumyantsev <ysrumyan at gmail dot com>, gcc-patches <gcc-patches at gcc dot gnu dot org>, Igor Zamyatin <izamyatin at gmail dot com>, Areg Melik-Adamyan <areg dot melikadamyan at gmail dot com>
- Date: Fri, 15 Nov 2013 15:24:21 +0100 (CET)
- Subject: Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
- Authentication-results: sourceware.org; auth=none
- References: <CAEoMCqRPF8h_h0FU=+YHiizio-axzwx77q5gw-ewgbLRhv=cjQ at mail dot gmail dot com> <20131031151528 dot GS27813 at tucnak dot zalov dot cz> <c7dffd79-1947-4722-a75a-a25fdaafdeed at email dot android dot com> <CAGYS_T+m==Vu-bvWNUV0e3q2ZUZwSFSR526QiOiEsOoy_mnzQg at mail dot gmail dot com> <20131112110551 dot GP27813 at tucnak dot zalov dot cz> <CAGYS_TJ4qnkqgJR=8XRGc9cdQJ0CQ_nJpPxr8mC1DBWjf23L9Q at mail dot gmail dot com> <20131112124811 dot GQ27813 at tucnak dot zalov dot cz> <CAGYS_TKHh83oBtOBYD=oVZzKjikdNb_cVjPHF+Um=g9WZ_DnQQ at mail dot gmail dot com> <20131112141618 dot GS27813 at tucnak dot zalov dot cz> <52823CA1 dot 4020006 at suse dot de> <CAGYS_TKsZfoynGHWzMg9OEAi_ethAzkMgk0CDnCp61S8fqAMSw at mail dot gmail dot com> <alpine dot LNX dot 2 dot 00 dot 1311130947590 dot 4261 at zhemvz dot fhfr dot qr> <CAGYS_T+-PYK2mwWyQa9oiLRLjzvpe47MtMhaO-bM9nF=NFbmkw at mail dot gmail dot com> <alpine dot LNX dot 2 dot 00 dot 1311140939180 dot 4261 at zhemvz dot fhfr dot qr> <CAGYS_TLReW98=J2C9Th7TU-3n=GWnpbZpm9S1HXf0S3iSz7NNA at mail dot gmail dot com> <09964d5c-192d-4f5c-bffb-09fe052c79ab at email dot android dot com> <CAGYS_TKQV3sUWXGSJmX7GjmRcZ603vc-X-f_+OiEPO0DjFUYYw at mail dot gmail dot com>
On Fri, 15 Nov 2013, Sergey Ostanevich wrote:
> Richard,
>
> here's an example that causes trigger for the cost model.
I hardly believe that (AVX2)
.L9:
vmovups (%rsi), %xmm3
addl $1, %r8d
addq $256, %rsi
vinsertf128 $0x1, -240(%rsi), %ymm3, %ymm1
vmovups -224(%rsi), %xmm3
vinsertf128 $0x1, -208(%rsi), %ymm3, %ymm3
vshufps $136, %ymm3, %ymm1, %ymm3
vperm2f128 $3, %ymm3, %ymm3, %ymm2
vshufps $68, %ymm2, %ymm3, %ymm1
vshufps $238, %ymm2, %ymm3, %ymm2
vmovups -192(%rsi), %xmm3
vinsertf128 $1, %xmm2, %ymm1, %ymm2
vinsertf128 $0x1, -176(%rsi), %ymm3, %ymm1
vmovups -160(%rsi), %xmm3
vinsertf128 $0x1, -144(%rsi), %ymm3, %ymm3
vshufps $136, %ymm3, %ymm1, %ymm3
vperm2f128 $3, %ymm3, %ymm3, %ymm1
vshufps $68, %ymm1, %ymm3, %ymm4
vshufps $238, %ymm1, %ymm3, %ymm1
vmovups -128(%rsi), %xmm3
vinsertf128 $1, %xmm1, %ymm4, %ymm1
vshufps $136, %ymm1, %ymm2, %ymm1
vperm2f128 $3, %ymm1, %ymm1, %ymm2
vshufps $68, %ymm2, %ymm1, %ymm4
vshufps $238, %ymm2, %ymm1, %ymm2
vinsertf128 $0x1, -112(%rsi), %ymm3, %ymm1
vmovups -96(%rsi), %xmm3
vinsertf128 $1, %xmm2, %ymm4, %ymm4
vinsertf128 $0x1, -80(%rsi), %ymm3, %ymm3
vshufps $136, %ymm3, %ymm1, %ymm3
vperm2f128 $3, %ymm3, %ymm3, %ymm2
vshufps $68, %ymm2, %ymm3, %ymm1
vshufps $238, %ymm2, %ymm3, %ymm2
vmovups -64(%rsi), %xmm3
vinsertf128 $1, %xmm2, %ymm1, %ymm2
vinsertf128 $0x1, -48(%rsi), %ymm3, %ymm1
vmovups -32(%rsi), %xmm3
vinsertf128 $0x1, -16(%rsi), %ymm3, %ymm3
cmpl %r8d, %edi
vshufps $136, %ymm3, %ymm1, %ymm3
vperm2f128 $3, %ymm3, %ymm3, %ymm1
vshufps $68, %ymm1, %ymm3, %ymm5
vshufps $238, %ymm1, %ymm3, %ymm1
vinsertf128 $1, %xmm1, %ymm5, %ymm1
vshufps $136, %ymm1, %ymm2, %ymm1
vperm2f128 $3, %ymm1, %ymm1, %ymm2
vshufps $68, %ymm2, %ymm1, %ymm3
vshufps $238, %ymm2, %ymm1, %ymm2
vinsertf128 $1, %xmm2, %ymm3, %ymm1
vshufps $136, %ymm1, %ymm4, %ymm1
vperm2f128 $3, %ymm1, %ymm1, %ymm2
vshufps $68, %ymm2, %ymm1, %ymm3
vshufps $238, %ymm2, %ymm1, %ymm2
vinsertf128 $1, %xmm2, %ymm3, %ymm2
vaddps %ymm2, %ymm0, %ymm0
ja .L9
is more efficient than
.L3:
vaddss (%rcx,%rax), %xmm0, %xmm0
addq $32, %rax
cmpq %rdx, %rax
jne .L3
;)
> As soon as
> elemental functions will appear and we update the vectorizer so it can accept
> an elemental function inside the loop - we will have the same
> situation as we have
> it now: cost model will bail out with profitability estimation.
Yes.
> Still we have no chance to get info on how efficient the bar() function when it
> is in vector form.
Well I assume you mean that the speedup when vectorizing the elemental
will offset whatever wreckage we cause with vectorizing the rest of the
statements. I'd say you can at least compare to unrolling by
the vectorization factor, building the vector inputs to the elemental
from scalars, distributing the vector result from the elemental to
scalars.
> I believe I should repeat: #pragma omp simd is intended for introduction of an
> instruction-level parallel region on developer's request, hence should
> be treated
> in same manner as #pragma omp parallel. Vectorizer cost model is an obstacle
> here, not a help.
Surely not if there isn't an elemental call in it. With it the
cost model of course will have not enough information to decide.
But still, what's the difference to the case where we cannot vectorize
the function? What happens if we cannot vectorize the elemental?
Do we have to build scalar versions for all possible vector sizes?
Richard.
> Regards,
> Sergos
>
>
> On Fri, Nov 15, 2013 at 1:08 AM, Richard Biener <rguenther@suse.de> wrote:
> > Sergey Ostanevich <sergos.gnu@gmail.com> wrote:
> >>this is only for the whole file? I mean to have a particular loop
> >>vectorized in a
> >>file while all others - up to compiler's cost model. is there such a
> >>machinery?
> >
> > No, there is not.
> >
> > Richard.
> >
> >>Sergos
> >>
> >>On Thu, Nov 14, 2013 at 12:39 PM, Richard Biener <rguenther@suse.de>
> >>wrote:
> >>> On Wed, 13 Nov 2013, Sergey Ostanevich wrote:
> >>>
> >>>> I will get some tests.
> >>>> As for cost analysis - simply consider the pragma as a request to
> >>>> vectorize. How can I - as a developer - enforce it beyond the
> >>pragma?
> >>>
> >>> You can disable the cost model via -fvect-cost-model=unlimited
> >>>
> >>> Richard.
> >>>
> >>>> On Wed, Nov 13, 2013 at 12:55 PM, Richard Biener <rguenther@suse.de>
> >>wrote:
> >>>> > On Tue, 12 Nov 2013, Sergey Ostanevich wrote:
> >>>> >
> >>>> >> The reason patch was in its original state is because we want
> >>>> >> to notify user that his assumption of profitability may be wrong.
> >>>> >> This is not a part of any spec and as far as I know ICC does not
> >>>> >> notify user about the case. Still it can be a good hint for those
> >>>> >> users who tries to get as much as possible performance.
> >>>> >>
> >>>> >> Richard's comment on the vectorization problems is about the same
> >>-
> >>>> >> to inform user that his attempt to force vectorization is failed.
> >>>> >>
> >>>> >> As for profitable or not - sometimes I believe it's impossible to
> >>be
> >>>> >> precise. For OMP we have case of a vector version of a function
> >>>> >> and we have no chance to figure out whether it is profitable to
> >>use
> >>>> >> it or to loose it. If we can't map the loop for any vector length
> >>>> >> other than 1 - I believe in this case we have to bail out and
> >>report.
> >>>> >> Is it about 'never profitable'?
> >>>> >
> >>>> > For example. I think we should report non-vectorized loops
> >>>> > that are marked with force_vect anyway, with
> >>-Wdisabled-optimization.
> >>>> > Another case is that a loop may be profitable to vectorize if
> >>>> > the ISA supports a gather instruction but otherwise not. Or if
> >>the
> >>>> > ISA supports efficient vector construction from N not loop
> >>>> > invariant scalars (for vectorization of strided loads).
> >>>> >
> >>>> > Simply disregarding all of the cost analysis sounds completely
> >>>> > bogus to me.
> >>>> >
> >>>> > I'd simply go for the diagnostic for now, not changing anything
> >>else.
> >>>> > We want to have a good understanding about why the cost model is
> >>>> > so bad that we have to force to ignore it for #pragma simd - thus
> >>we
> >>>> > want testcases.
> >>>> >
> >>>> > Richard.
> >>>> >
> >>>> >>
> >>>> >> On Tue, Nov 12, 2013 at 6:35 PM, Richard Biener
> >><rguenther@suse.de> wrote:
> >>>> >> > On 11/12/13 3:16 PM, Jakub Jelinek wrote:
> >>>> >> >> On Tue, Nov 12, 2013 at 05:46:14PM +0400, Sergey Ostanevich
> >>wrote:
> >>>> >> >>> ivdep just substitutes all cross-iteration data analysis,
> >>>> >> >>> nothing related to cost model. ICC does not cancel its
> >>>> >> >>> cost model in case of #pragma ivdep
> >>>> >> >>>
> >>>> >> >>> as for the safelen - OMP standart treats it as a limitation
> >>>> >> >>> for the vector length. this means if no safelen is present
> >>>> >> >>> an arbitrary vector length can be used.
> >>>> >> >>
> >>>> >> >> I was talking about GCC loop->safelen, which is INT_MAX for
> >>#pragma omp simd
> >>>> >> >> without safelen clause or #pragma simd without vectorlength
> >>clause.
> >>>> >> >>
> >>>> >> >>> so I believe loop->force_vect is the only trigger to
> >>disregard
> >>>> >> >>> the cost model
> >>>> >> >>
> >>>> >> >> Anyway, in that case I think the originally posted patch is
> >>wrong,
> >>>> >> >> if we want to treat force_vect as disregard all the cost model
> >>and
> >>>> >> >> force vectorization (well, the name of the field already kind
> >>of suggest
> >>>> >> >> that), then IMHO we should treat it the same as
> >>-fvect-cost-model=unlimited
> >>>> >> >> for those loops.
> >>>> >> >
> >>>> >> > Err - the user may have a specific sub-architecture in mind
> >>when using
> >>>> >> > #pragma simd, if you say we should completely ignore the cost
> >>model
> >>>> >> > then should we also sorry () if we cannot vectorize the loop
> >>(either
> >>>> >> > because of GCC deficiencies or lack of sub-target support)?
> >>>> >> >
> >>>> >> > That said, at least in the cases that the cost model says the
> >>loop
> >>>> >> > is never profitable to vectorize we should follow its advice.
> >>>> >> >
> >>>> >> > Richard.
> >>>> >> >
> >>>> >> >> Thus (untested):
> >>>> >> >>
> >>>> >> >> 2013-11-12 Jakub Jelinek <jakub@redhat.com>
> >>>> >> >>
> >>>> >> >> * tree-vect-loop.c (vect_estimate_min_profitable_iters):
> >>Use
> >>>> >> >> unlimited cost model also for force_vect loops.
> >>>> >> >>
> >>>> >> >> --- gcc/tree-vect-loop.c.jj 2013-11-12 12:09:40.000000000
> >>+0100
> >>>> >> >> +++ gcc/tree-vect-loop.c 2013-11-12 15:11:43.821404330
> >>+0100
> >>>> >> >> @@ -2702,7 +2702,7 @@ vect_estimate_min_profitable_iters (loop
> >>>> >> >> void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA
> >>(loop_vinfo);
> >>>> >> >>
> >>>> >> >> /* Cost model disabled. */
> >>>> >> >> - if (unlimited_cost_model ())
> >>>> >> >> + if (unlimited_cost_model () || LOOP_VINFO_LOOP
> >>(loop_vinfo)->force_vect)
> >>>> >> >> {
> >>>> >> >> dump_printf_loc (MSG_NOTE, vect_location, "cost model
> >>disabled.\n");
> >>>> >> >> *ret_min_profitable_niters = 0;
> >>>> >> >>
> >>>> >> >> Jakub
> >>>> >> >>
> >>>> >> >
> >>>> >>
> >>>> >>
> >>>> >
> >>>> > --
> >>>> > Richard Biener <rguenther@suse.de>
> >>>> > SUSE / SUSE Labs
> >>>> > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
> >>>> > GF: Jeff Hawn, Jennifer Guild, Felix Imend
> >>>>
> >>>>
> >>>
> >>> --
> >>> Richard Biener <rguenther@suse.de>
> >>> SUSE / SUSE Labs
> >>> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
> >>> GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
> >
> >
>
--
Richard Biener <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
- References:
- Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
- Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
- Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
- Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
- Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
- Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
- Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
- Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
- Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
- Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
- Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
- Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
- Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
- Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.