[Bug target/20748] New: -fprefetch-loop-arrays increases run time considerably
uros at kss-loka dot si
gcc-bugzilla@gcc.gnu.org
Mon Apr 4 14:26:00 GMT 2005
I was playing with -fprefetch-loop-arrays on pentium4, trying to get some
speed-up with simple operations on arrays. Consider this small testcase:
#define NELEM 10000000
#define NITER 1000
int buf[NELEM];
int main() {
int i,j;
int sum = 0;
double ssum = 0.0;
for (i = 0; i < NELEM; i++)
buf[i] = i;
for (j = 0; j < NITER; j++) {
for (i = 0; i < NELEM; i++)
sum += buf[i];
ssum += sum;
}
printf ("%f\n", ssum);
return 0;
}
gcc -O2 -march=pentium4:
time ./a.out
3347504896.000000
real 0m18.114s
user 0m17.910s
sys 0m0.072s
Using -fprefetch-loop-arrays, the run time increases drastically:
gcc -O2 -march=pentium4 -fprefetch-loop-arrays
time ./a.out
3347504896.000000
real 0m27.678s
user 0m27.611s
sys 0m0.051s
That is, more than 50% performance hit using -fprefetch-loop-arrays on pentium4.
The inner loop looks like:
.L5:
prefetcht0 384(%eax)
addl (%eax), %edx
addl $4, %eax
cmpl %eax, %ecx
jne .L5
Without -fprefetch-loop-arrays, the code for the inner loop is the same (without
prefetch insn, of course). Is there everythin OK with prefetches on P4?
--
Summary: -fprefetch-loop-arrays increases run time considerably
Product: gcc
Version: 4.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: uros at kss-loka dot si
CC: gcc-bugs at gcc dot gnu dot org
GCC build triplet: i686-pc-linux-gnu
GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20748
More information about the Gcc-bugs
mailing list