[Bug target/20748] New: -fprefetch-loop-arrays increases run time considerably

uros at kss-loka dot si gcc-bugzilla@gcc.gnu.org
Mon Apr 4 14:26:00 GMT 2005


I was playing with -fprefetch-loop-arrays on pentium4, trying to get some
speed-up with simple operations on arrays. Consider this small testcase:

#define NELEM 10000000
#define NITER 1000

int buf[NELEM];

int main() {
  int i,j;
  int sum = 0;
  double ssum = 0.0;

  for (i = 0; i < NELEM; i++)
    buf[i] = i;

  for (j = 0; j < NITER; j++) {
    for (i = 0; i < NELEM; i++)
      sum += buf[i];
    ssum += sum;
  }

  printf ("%f\n", ssum);

  return 0;
}

gcc -O2 -march=pentium4:

time ./a.out
3347504896.000000

real    0m18.114s
user    0m17.910s
sys     0m0.072s

Using -fprefetch-loop-arrays, the run time increases drastically:
gcc -O2 -march=pentium4 -fprefetch-loop-arrays

time ./a.out
3347504896.000000

real    0m27.678s
user    0m27.611s
sys     0m0.051s

That is, more than 50% performance hit using -fprefetch-loop-arrays on pentium4.
The inner loop looks like:
.L5:
	prefetcht0	384(%eax)
	addl	(%eax), %edx
	addl	$4, %eax
	cmpl	%eax, %ecx
	jne	.L5

Without -fprefetch-loop-arrays, the code for the inner loop is the same (without
prefetch insn, of course). Is there everythin OK with prefetches on P4?

-- 
           Summary: -fprefetch-loop-arrays increases run time considerably
           Product: gcc
           Version: 4.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: uros at kss-loka dot si
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20748



More information about the Gcc-bugs mailing list