[Bug tree-optimization/91201] [7/8/9/10 Regression] SIMD not generated for horizontal sum of bytes in array

glisse at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Tue Jul 30 14:18:00 GMT 2019


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91201

--- Comment #12 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #11)
> I'm not aware of vcompressb insn, only vcompressps and vcompresspd.

Intel lists it under VBMI2, so icelake+.

> Sure,
> one could just emit whatever we emit for __builtin_shuffle with (__v64qi) {
> 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24,
> 32, 40, 48, 56, 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24, 32, 40, 48, 56,
> 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24,
> 32, 40, 48, 56 } or similar perm, the question is if it will be faster that
> way or not.

Exactly.


More information about the Gcc-bugs mailing list