[Bug target/106060] New: Inefficient constant broadcast on x86_64
goldstein.w.n at gmail dot com
gcc-bugzilla@gcc.gnu.org
Thu Jun 23 01:59:15 GMT 2022
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060
Bug ID: 106060
Summary: Inefficient constant broadcast on x86_64
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: goldstein.w.n at gmail dot com
Target Milestone: ---
```
#include <immintrin.h>
__m256i
shouldnt_have_movabs ()
{
return _mm256_set1_epi8 (123);
}
__m256i
should_be_cmpeq_abs ()
{
return _mm256_set1_epi8 (1);
}
__m256i
should_be_cmpeq_add ()
{
return _mm256_set1_epi8 (-2);
}
```
Compiled with: '-O3 -march=x86-64-v3'
Results in:
```
Disassembly of section .text:
0000000000000000 <shouldnt_have_movabs>:
0: 48 b8 7b 7b 7b 7b 7b movabs $0x7b7b7b7b7b7b7b7b,%rax
7: 7b 7b 7b
a: c4 e1 f9 6e c8 vmovq %rax,%xmm1
f: c4 e2 7d 59 c1 vpbroadcastq %xmm1,%ymm0
14: c3 retq
15: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
1c: 00 00 00 00
0000000000000020 <should_be_cmpeq_abs>:
20: 48 b8 01 01 01 01 01 movabs $0x101010101010101,%rax
27: 01 01 01
2a: c4 e1 f9 6e c8 vmovq %rax,%xmm1
2f: c4 e2 7d 59 c1 vpbroadcastq %xmm1,%ymm0
34: c3 retq
35: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
3c: 00 00 00 00
0000000000000040 <should_be_cmpeq_add>:
40: 48 b8 fe fe fe fe fe movabs $0xfefefefefefefefe,%rax
47: fe fe fe
4a: c4 e1 f9 6e c8 vmovq %rax,%xmm1
4f: c4 e2 7d 59 c1 vpbroadcastq %xmm1,%ymm0
54: c3 retq
```
Compiled with: '-O3 -march=x86-64-v4'
Results in:
```
0000000000000000 <shouldnt_have_movabs>:
0: 48 b8 7b 7b 7b 7b 7b movabs $0x7b7b7b7b7b7b7b7b,%rax
7: 7b 7b 7b
a: 62 f2 fd 28 7c c0 vpbroadcastq %rax,%ymm0
10: c3 retq
11: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
18: 00 00 00 00
1c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000000020 <should_be_cmpeq_abs>:
20: 48 b8 01 01 01 01 01 movabs $0x101010101010101,%rax
27: 01 01 01
2a: 62 f2 fd 28 7c c0 vpbroadcastq %rax,%ymm0
30: c3 retq
31: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
38: 00 00 00 00
3c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000000040 <should_be_cmpeq_add>:
40: 48 b8 fe fe fe fe fe movabs $0xfefefefefefefefe,%rax
47: fe fe fe
4a: 62 f2 fd 28 7c c0 vpbroadcastq %rax,%ymm0
50: c3 retq
```
All functions / targets are inoptimal.
Generating 1/2 can be done without any lane-cross broadcast.
Generating constants like 123 shouldn't first be constant broadcast
into an imm64. That makes it require an 10-byte `movabs` and wastes
spaces.
More information about the Gcc-bugs
mailing list