This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: PATCH to hoist loads/stores out of loops
You wrote:
[ This is beyond my wildest dreams of the possible outcomes of
the egcs project - having a C++ guru solving Fortran optimi-
sation shortcomings :-) :-) ]
Well, I did take a scientific computation course using Fortran90 on a
CM-2 at one point. You asked:
Could you also pull the same trick for the following code:
subroutine gemm(a, b, c, m, n, k)
integer i,m,n,k,l,j
dimension a(k,m), b(n,k), c(n,m)
do i=1,m ! poor for illustration only
do j=1,n
do l=1,k
c(j,i) = c(j,i) + a(l,i)*b(j,l)
end do
end do
end do
end
Actually, the code I submitted does do this. What, you say, it didn't
when you tried it? Oops, I forgot to tell you that I haven't yet
implemented C's `restrict'. (I'd like to do that; all I need is a
client to pay for it (or some free time).) Basically, what I'm
saying is that if you could convice GCC that a,b,c don't alias you'd
be all set. Right now, it's afraid that the store's into c(j,i) might
alter a(l,i) or b(j,i).
For exmaple, the following C variant:
double a[10][10];
double b[10][10];
double c[10][10];
void gemm(int m, int n, int k)
{
int i;
int j;
int l;
for (i = 0; i < m; ++i)
for (j = 0; j < n; ++j)
for (l = 0; l < k; ++l)
c[j][i] += a[l][i] * b[j][l];
}
is optimized as you suggest, with the patches I submitted. The inner
loop looks like:
fldl c(%ebx) # Load c(j,i)
.p2align 4,,7
.L13:
fldl (%ecx) # Load a(l,i)
fmull b(%edx) # Load b(j,l)
addl $80,%ecx # Increment some pointers
addl $8,%edx
incl %eax # Increment l
faddp %st,%st(1) # Add
cmpl 16(%ebp),%eax # Branch
jl .L13
fstpl c(%ebx) # Store c(j,i)
So, we're one (relatively small) step away.
--
Mark Mitchell mark@markmitchell.com
Mark Mitchell Consulting http://www.markmitchell.com