This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/37194] New: Autovectorization of constant iteration loop degrades performance
- From: "pthaugen at gcc dot gnu dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 21 Aug 2008 19:21:56 -0000
- Subject: [Bug tree-optimization/37194] New: Autovectorization of constant iteration loop degrades performance
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
Seeing a degradation in cpu2000 benchmark 252.eon that is caused by
autovectorization of a simple loop in function ggSpectrum::Set(float).
Here's a simple C version.
void ggSpectrum_Set(float * data, float d) {
int i;
for (i = 0; i < 8; i++)
data[i] = d;
}
When compiled with -O3 -mcpu=970 the following code is generated:
ggSpectrum_Set:
mfvrsave 0
stwu 1,-48(1)
stw 0,44(1)
oris 0,0,0x8000
mtvrsave 0
li 10,0
rlwinm 0,3,30,30,31
subfic 0,0,4
andi. 9,0,3
beq- 0,.L16
mtctr 9
.p2align 4,,15
.L10:
slwi 0,10,2
addi 10,10,1
stfsx 1,3,0
subfic 8,10,8
bdnz .L10
.L3:
subfic 6,9,8
srwi 0,6,2
slwi. 7,0,2
beq- 0,.L5
mtctr 0
stfs 1,16(1)
cmpwi 7,0,0
li 0,16
slwi 9,9,2
li 11,0
add 9,3,9
lvewx 0,1,0
vspltw 0,0,0
beq- 7,.L17
.p2align 4,,15
.L6:
slwi 0,11,4
addi 11,11,1
stvx 0,9,0
bdnz .L6
cmpw 7,6,7
subf 8,7,8
add 10,10,7
beq- 7,.L9
.L5:
mtctr 8
slwi 0,10,2
add 3,3,0
.p2align 4,,15
.L8:
stfs 1,0(3)
addi 3,3,4
bdnz .L8
.L9:
lwz 12,44(1)
mtvrsave 12
addi 1,1,48
blr
.L16:
mr 10,9
li 8,8
b .L3
.L17:
li 0,1
mtctr 0
b .L6
Adding -mno-altivec results in this simpler sequence, and a significant boost
in performance (~40% speedup for the benchmark):
ggSpectrum_Set:
stfs 1,28(3)
stfs 1,0(3)
stfs 1,4(3)
stfs 1,8(3)
stfs 1,12(3)
stfs 1,16(3)
stfs 1,20(3)
stfs 1,24(3)
blr
Another thing that stood out from the benchmark run was that the code was
taking a pretty big hit on a couple of the statically predicted branches
(apparently the address was already 16 byte aligned a lot of the time). So it
seems like it would be best to remove the static prediction and let the
hardware prediction take over.
--
Summary: Autovectorization of constant iteration loop degrades
performance
Product: gcc
Version: 4.4.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: pthaugen at gcc dot gnu dot org
GCC build triplet: powerpc64-linux
GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37194