[Bug rtl-optimization/56766] Fails to combine (vec_select (vec_concat ...)) to (vec_merge ...)
ubizjak at gmail dot com
gcc-bugzilla@gcc.gnu.org
Tue Jun 16 15:33:00 GMT 2015
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56766
--- Comment #26 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to rguenther@suse.de from comment #25)
> > Richi, please note that tree-vectorizer doesn't vectorize bar_v2df, at least
> > there is no VEC_PERM_EXPR in the .optimized dump:
> >
> > void bar_v2df (double * __restrict__ p, double * __restrict q)
> > {
> > p[0] = p[0] - q[0];
> > p[1] = p[1] + q[1];
> > }
>
> That's because of (unless you specify -fno-vect-cost-model):
>
> t.c:3:11: note: Cost model analysis:
> Vector inside of basic block cost: 9
> Vector prologue cost: 0
> Vector epilogue cost: 0
> Scalar cost of basic block: 8
> t.c:3:11: note: not vectorized: vectorization is not profitable.
>
> so it computes a too high vectorized cost. This is because the
> target unspecific code handling this is estimating the cost as
> needing both the add and the subtract and the shuffle. The
> target vectorizer cost hook could adjust this to a more sensible
> value if addsubpd is available.
Thanks, -fno-vec-cost-model did the trick here.
> > Another question w.r.t. to foo_* testcases that use __builtin_shuffle:
> >
> > v4sf foo_v4sf (v4sf x, v4sf y)
> > {
> > v4sf tem0 = x - y;
> > v4sf tem1 = x + y;
> > return __builtin_shuffle (tem0, tem1, (v4si) { 0, 5, 2, 7 });
> > }
> >
> > is functionaly equivalent to:
> >
> > v4sf foo_v4sf (v4sf x, v4sf y)
> > {
> > v4sf tem0 = x + y;
> > v4sf tem1 = x - y;
> > return __builtin_shuffle (tem0, tem1, (v4si) { 4, 1, 6, 3 });
> > }
> >
> > But the later construct isn't simplified. Should we declare canonical form as
> > the one with "element 0 from the first operand"?
>
> That one is interesting. I'd say we'd need to define a total ordering
> here. Note that a canonical form is only accepted when the target accepts
> it (see the VEC_PERM_EXPR case in fold-const.c).
>
> So, if we can write a function compare_perm_for_canonical (unsigned char
> *sel1, unsigned char *sel2, unsigned n) we could use that to determine
> if swapping arg0 and arg1 makes the permute mask more canonical.
>
> So yes, we should have a canonical form for the above and yes, we
> could say that we order after element0 and if that is equal after
> element1, and so on.
I will open a new PR for this, I think that the proposed patch fixes this one.
More information about the Gcc-bugs
mailing list