[Bug target/111354] New: [7/10/12 regression] The instructions of the DPDK demo program are different and run time increases.
d_vampile at 163 dot com
gcc-bugzilla@gcc.gnu.org
Sat Sep 9 04:21:49 GMT 2023
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111354
Bug ID: 111354
Summary: [7/10/12 regression] The instructions of the DPDK demo
program are different and run time increases.
Product: gcc
Version: 10.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: d_vampile at 163 dot com
Target Milestone: ---
Created attachment 55863
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55863&action=edit
test case
Test platform: x86_64
The test platform supports avx2 and sse4.2
Default mtune=generic
Compiler Options:
gcc main.c -g -o main -O2 -msse4.2 -mavx2 -fno-inline
GCC 7.3.0 produces:
.L3:
vmovdqu (%rsi), %xmm3
subq $-128, %rdi
subq $-128, %rsi
vmovdqu -96(%rsi), %xmm2
vinserti128 $0x1, -112(%rsi), %ymm3, %ymm3
vmovdqu -64(%rsi), %xmm1
vinserti128 $0x1, -80(%rsi), %ymm2, %ymm2
vmovdqu -32(%rsi), %xmm0
vinserti128 $0x1, -48(%rsi), %ymm1, %ymm1
vinserti128 $0x1, -16(%rsi), %ymm0, %ymm0
vmovups %xmm3, -128(%rdi)
vextracti128 $0x1, %ymm3, -112(%rdi)
vmovups %xmm2, -96(%rdi)
vextracti128 $0x1, %ymm2, -80(%rdi)
vmovups %xmm1, -64(%rdi)
vextracti128 $0x1, %ymm1, -48(%rdi)
vmovups %xmm0, -32(%rdi)
vextracti128 $0x1, %ymm0, -16(%rdi)
cmpq %rax, %rdi
jne .L3
vzeroupper
Runtime with gcc7.3.0:
$ time ./main_gcc7.3 2000
start to run 2000.
end to run 2000.
real 6m30.461s
user 6m26.587s
sys 0m0.814s
GCC 10.3.0 produces:
.L3:
vmovdqu (%rsi), %xmm4
vmovdqu 32(%rsi), %xmm5
subq $-128, %rdi
subq $-128, %rsi
vmovdqu -64(%rsi), %xmm6
vmovdqu -32(%rsi), %xmm7
vinserti128 $0x1, -112(%rsi), %ymm4, %ymm3
vinserti128 $0x1, -80(%rsi), %ymm5, %ymm2
vinserti128 $0x1, -48(%rsi), %ymm6, %ymm1
vinserti128 $0x1, -16(%rsi), %ymm7, %ymm0
vmovdqu %xmm3, -128(%rdi)
vextracti128 $0x1, %ymm3, -112(%rdi)
vextracti128 $0x1, %ymm2, -80(%rdi)
vmovdqu %xmm2, -96(%rdi)
vextracti128 $0x1, %ymm1, -48(%rdi)
vextracti128 $0x1, %ymm0, -16(%rdi)
vmovdqu %xmm1, -64(%rdi)
vmovdqu %xmm0, -32(%rdi)
cmpq %rax, %rdi
jne .L3
vzeroupper
Runtime with gcc10.3.0:
$ time ./main_gcc10.3 2000
start to run 2000.
end to run 2000.
real 7m18.696s
user 7m13.912s
sys 0m1.098s
GCC 12.3.0 produces:
.L3:
vmovdqu (%rsi), %ymm2
vmovdqu 32(%rsi), %ymm1
subq $-128, %rdi
subq $-128, %rsi
vmovdqu -64(%rsi), %ymm0
vmovdqu -32(%rsi), %ymm3
vmovdqu %ymm2, -128(%rdi)
vmovdqu %ymm3, -32(%rdi)
vmovdqu %ymm1, -96(%rdi)
vmovdqu %ymm0, -64(%rdi)
cmpq %rax, %rdi
jne .L3
vzeroupper
Runtime with gcc12.3.0:
$ time ./main_gcc12.3 2000
start to run 2000.
end to run 2000.
real 10m1.303s
user 9m52.527s
sys 0m2.253s
Why does it seem that the instructions of gcc12 are simpler but run time is
significantly increased in the same test environment and compilation options?
What is the reason for the different instructions generated by these three
versions of gcc?
More information about the Gcc-bugs
mailing list