Bug 53397

Summary: Scimark performance drops by 10x times when compiled -O3 -march=amdfam10 due to generation more prefecthes
Product: gcc Reporter: Venkataramanan <venkataramanan.kumar>
Component: tree-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: major CC: octoploid, paulo, rguenth
Priority: P3    
Version: 4.7.1   
Target Milestone: ---   
Host: x86_64-unknown-linux-gnu Target: x86_64-unknown-linux-gnu
Build: x86_64-unknown-linux-gnu Known to work:
Known to fail: Last reconfirmed: 2012-05-18 00:00:00
Bug Depends on:    
Bug Blocks: 79703    

Description Venkataramanan 2012-05-18 12:02:13 UTC
With GCC4.7 the benchmark score drops from ~400 Mflops to ~40 mflops. Almost 10 folds.

Prefecth instructions introduced in the innermost loops of "FFT_transform_internal" ( FFT.c ) in GCC4.7 but not in GCC4.6 which is causing the slow down. 

Compiling this function alone as a separate test case with -fno-prefetch-loop-arrays brings back the original score.

The problem is exposed http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=175474

With GCC r175473
--------------------------
gcc -O3 -march=amdfam10 *.c -o Scimark175473 -lm vekumar@pcedinar5:/local/home/vekumar/SciMark2_bench/SciMark2> ./Scimark175473
**                                                              **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo@nist.gov)     **
**                                                              **
Using       2.00 seconds min time per kenel.
Composite Score:           99.67
FFT             Mflops:   498.35    (N=1024)

With GCC r175474
-------------------------
gcc -O3 -march=amdfam10 *.c -o Scimark175474 -lm vekumar@pcedinar5:/local/home/vekumar/SciMark2_bench/SciMark2> ./Scimark175474
**                                                              **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo@nist.gov)     **
**                                                              **
Using       2.00 seconds min time per kenel.
Composite Score:            7.73
FFT             Mflops:    38.66    (N=1024)
Comment 1 Richard Biener 2012-05-18 12:11:07 UTC
Confirmed.
Comment 2 Venkataramanan 2012-10-09 15:55:04 UTC
Fixed.
http://gcc.gnu.org/viewcvs?view=revision&revision=192261