[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform
haochen.jiang at intel dot com
gcc-bugzilla@gcc.gnu.org
Mon May 20 05:50:16 GMT 2024
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069
--- Comment #12 from Haochen Jiang <haochen.jiang at intel dot com> ---
(In reply to Hongtao Liu from comment #11)
> (In reply to Haochen Jiang from comment #10)
> > A patch like Comment 8 could definitely solve the problem. But I need to
> > test more benchmarks to see if there is surprise.
> >
> > But, yes, as Uros said in Comment 9, maybe there is a chance we could do it
> > better.
>
> Could you add "arch=skylake-avx512" to target_clones and try disable whole
> ix86_expand_vecop_qihi2 to see if there's any performance improvement?
> For x86, cross-lane permutation(truncation) is not very efficient(3-4 cycles
> for both vpermq and vpmovwb).
When I disable/enable ix86_expand_vecop_qihi2 with arch=skylake-avx512 on
trunk, there is no performance regression comparing to GCC13 + avx2.
It seems that the regression only happens when GCC14 + avx2.
More information about the Gcc-bugs
mailing list