[Bug target/45391] CPU2006 482.sphinx3: gcc4.6 5% regression from prefetching of vectorized loop
changpeng dot fang at amd dot com
gcc-bugzilla@gcc.gnu.org
Tue Aug 24 00:22:00 GMT 2010
------- Comment #3 from changpeng dot fang at amd dot com 2010-08-24 00:22 -------
I checked with open64 and did not find any regression. And for the above
testcase, open64 generated 3 non-temporal prefetches. As a result, I am
guessing that we are just unlucky that the prefetch kicks out useful data
for such streaming accesses (gcc generate one prefetcht0):
.Lt_0_6402:
#<loop> Loop body line 8, nesting depth: 1, estimated iterations: 1000
.loc 1 7 0
movss 0(%r10),%xmm0 # [0] id:67
subss 0(%r9),%xmm0 # [3]
.loc 1 8 0
mulss %xmm0,%xmm0 # [9]
mulss 0(%rax),%xmm0 # [13]
.loc 1 7 0
prefetchnta 128(%r10) # [17] L1
prefetchnta 128(%r9) # [17] L1
.loc 1 8 0
addq $4,%rax # [17]
addq $4,%r10 # [18]
addq $4,%r9 # [18]
cmpq %r11,%rax # [18]
prefetchnta 124(%rax) # [19] L1
subss %xmm0,%xmm1 # [19]
jle .Lt_0_6402 # [19]
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45391
More information about the Gcc-bugs
mailing list