[Bug target/45391] CPU2006 482.sphinx3: gcc4.6 5% regression from prefetching of vectorized loop

changpeng dot fang at amd dot com gcc-bugzilla@gcc.gnu.org
Tue Aug 24 00:22:00 GMT 2010



------- Comment #3 from changpeng dot fang at amd dot com  2010-08-24 00:22 -------
I checked with open64 and did not find any regression. And for the above
testcase, open64 generated 3 non-temporal prefetches. As a result, I am 
guessing that we are just unlucky that the prefetch kicks out useful data
for such streaming accesses (gcc generate one prefetcht0):

.Lt_0_6402:
 #<loop> Loop body line 8, nesting depth: 1, estimated iterations: 1000
        .loc    1       7       0
        movss 0(%r10),%xmm0             # [0] id:67
        subss 0(%r9),%xmm0              # [3] 
        .loc    1       8       0
        mulss %xmm0,%xmm0               # [9] 
        mulss 0(%rax),%xmm0             # [13] 
        .loc    1       7       0
        prefetchnta 128(%r10)           # [17] L1
        prefetchnta 128(%r9)            # [17] L1
        .loc    1       8       0
        addq $4,%rax                    # [17] 
        addq $4,%r10                    # [18] 
        addq $4,%r9                     # [18] 
        cmpq %r11,%rax                  # [18] 
        prefetchnta 124(%rax)           # [19] L1
        subss %xmm0,%xmm1               # [19] 
        jle .Lt_0_6402                  # [19] 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45391



More information about the Gcc-bugs mailing list