[Bug tree-optimization/97770] [ICELAKE]Missing vectorization for vpopcnt
crazylht at gmail dot com
gcc-bugzilla@gcc.gnu.org
Fri Jun 4 07:41:43 GMT 2021
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97770
--- Comment #15 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #14)
> So we vectorize to
>
> _18 = .POPCOUNT (vect__5.7_22);
> _17 = .POPCOUNT (vect__5.7_21);
> vect__6.8_16 = VEC_PACK_TRUNC_EXPR <_18, _17>;
> _6 = 0;
> _7 = dest_13(D) + _2;
> vect__8.9_10 = [vec_unpack_lo_expr] vect__6.8_16;
> vect__8.9_9 = [vec_unpack_hi_expr] vect__6.8_16;
> _8 = (long long int) _6;
>
> which is exactly the issue that in the scalar code we have a 'int' producing
> popcount with long long argument but the vector IFN produces a result of the
> same width as the argument. So the vectorizer compensates for that
> (VEC_PACK_TRUNC_EXPR) and then vectorizes the widening that's in the scalar
> code (vec_unpack_{lo,hi}_expr). The fix for this and for the missing
> byte and word variants is to add a pattern to tree-vect-patterns.c for this
> case matching it to the .POPCOUNT internal function. That possibly applies
> to other bitops, too, like parity, ctz, ffs, etc. There's quite some
> _widen helpers in the pattern recog code so I'm not sure how complicated
> it is to match
>
> (long)popcountl(long)
>
> and
>
> (short)popcount((int)short)
>
> Richard may have a good idea since he did the last "big" surgery there.
Any suggestion for this, should we change prototype of builtins or add
vec_recog_popcnt_pattern in vectorizer?
More information about the Gcc-bugs
mailing list