This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/80636] New: AVX / AVX512 register-zeroing should always use AVX 128b, not ymm or zmm
- From: "peter at cordes dot ca" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 05 May 2017 00:02:56 +0000
- Subject: [Bug target/80636] New: AVX / AVX512 register-zeroing should always use AVX 128b, not ymm or zmm
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80636
Bug ID: 80636
Summary: AVX / AVX512 register-zeroing should always use AVX
128b, not ymm or zmm
Product: gcc
Version: 8.0
URL: http://stackoverflow.com/questions/43713273/is-vxorps-
zeroing-on-amd-jaguar-bulldozer-zen-faster-with-xmm-re
gisters-than-ymm
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
Currently, gcc compiles _mm256_setzero_ps() to vxorps %ymm0, %ymm0, %ymm0, or
zmm for _mm512_setzero_ps. And similar for pd and integer vectors, using a
vector size that matches how it's going to use the register.
vxorps %xmm0, %xmm0, %xmm0 has the same effect, because AVX instructions zero
the destination register out to VLMAX.
AMD Ryzen decodes the xmm version to 1 micro-op, but the ymm version to 2
micro-ops. It doesn't detect the zeroing idiom special-case until after the
decoder has split it. (Earlier AMD CPUs (Bulldozer/Jaguar) may be similar.)
---
For zeroing a ZMM register, it also saves a byte or two to use a VEX prefix
instead of EVEX, if the target register is zmm0-15. (zmm16-31 of course always
need EVEX).
---
There is no benefit, but also no downside, to using xmm-zeroing on Intel CPUs
that don't split 256b or 512b vector ops. This change could be made across the
board, without adding any tuning options to control it.
References:
http://stackoverflow.com/a/43751783/224132 Agner Fog's answer to my SO question
about this.
https://bugs.llvm.org/show_bug.cgi?id=32862 the same issue for clang.