[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

Fri May 17 08:48:22 GMT 2024

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069

--- Comment #9 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Uroš Bizjak from comment #8)
> A better patch:

The real issue is that the following permutation (truncation):

+      for (i = 0; i < d.nelt; ++i)
+       d.perm[i] = i * 2;
+
+      ok = ix86_expand_vec_perm_const_1 (&d);

results in a slow code involving VPERMQ. Ideally, ix86_expand_vec_perm_const_1
should emit faster code for truncation, because this will benefit other code as
well.