[Bug target/82459] AVX512F instruction costs: vmovdqu8 stores may be an extra uop, and vpmovwb is 2 uops on Skylake and not always worth using
andrew.n.senkevich at gmail dot com
gcc-bugzilla@gcc.gnu.org
Thu Nov 23 17:35:00 GMT 2017
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82459
Andrew Senkevich <andrew.n.senkevich at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |andrew.n.senkevich at gmail dot co
| |m
--- Comment #2 from Andrew Senkevich <andrew.n.senkevich at gmail dot com> ---
Currently -mprefer-avx256 is default for SKX and vzeroupper addition was fixed,
code generated is:
.L3:
vpsrlw $8, (%rsi,%rax,2), %ymm0
vpsrlw $8, 32(%rsi,%rax,2), %ymm1
vpand %ymm0, %ymm2, %ymm0
vpand %ymm1, %ymm2, %ymm1
vpackuswb %ymm1, %ymm0, %ymm0
vpermq $216, %ymm0, %ymm0
vmovdqu8 %ymm0, (%rdi,%rax)
addq $32, %rax
cmpq %rax, %rdx
jne .L3
vmovdqu8 remains but I cannot confirm it is slower.
More information about the Gcc-bugs
mailing list