This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
alpha+loop unrolling=craziness
- To: gcc at gcc dot gnu dot org
- Subject: alpha+loop unrolling=craziness
- From: Brad Lucier <lucier at math dot purdue dot edu>
- Date: Mon, 23 Oct 2000 21:52:58 -0500 (EST)
- Cc: lucier at math dot purdue dot edu (Brad Lucier)
With this double loop:
#include <stdlib.h>
#define FLOAT float
void bench (int nvec)
{
int *idat;
int ireps;
int num;
int i,j;
ireps = 1024*2048/nvec;
num = nvec*1024/sizeof(int);
idat = (int *) malloc(num*sizeof(int));
for (j=0; j<ireps; ++j)
{
for (i=0; i<num; ++i)
idat[i] = 18;
}
free(idat);
}
when compiled with options -O3 -funroll-all-loops on alphaev6-unknown-linux-gnu
by gcc-2.95.1, the main body of the inner loop is unrolled four times
and compiled to
$L40:
stl $5,0($2)
stl $5,4($2)
addl $4,4,$4
stl $5,8($2)
stl $5,12($2)
cmplt $4,$9,$1
addq $2,16,$2
bne $1,$L40
which is scheduled in 3 cycles. But with
popov-80% /export/u10/egcs-test/bin/gcc -v
Reading specs from /export/u10/egcs-test/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.97/specs
Configured with: --prefix=/export/u10/egcs-test --enable-checking=no
gcc version 2.97 20001023 (experimental)
this inner loop is compiled to
$L40:
lda $1,1($4)
stl $5,0($2)
stl $5,4($2)
lda $1,3($1)
stl $5,8($2)
stl $5,12($2)
lda $2,16($2)
$L64:
addl $1,$31,$4
cmplt $4,$9,$1
bne $1,$L40
which is scheduled in 6 cycles. Something is choosing a really
slow way to add 4 to register $4 here.
Perhaps this test loop is simple enough that someone can see what
is going on.
Brad Lucier