This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Strange IV choices?
- From: Richard Guenther <rguenth at tat dot physik dot uni-tuebingen dot de>
- To: gcc at gcc dot gnu dot org
- Date: Tue, 14 Dec 2004 15:56:32 +0100 (CET)
- Subject: Strange IV choices?
Hi!
It seems that ivopts is confused by local copies of objects.
Suppose you have some complex array managing class, the two
functionally identical functions
void arrayAssignManual(const Array<2, double, BrickViewU>& a,
const Array<2, double, BrickViewU>& b,
const Array<2, double, BrickViewU>& c,
const Array<2, double, BrickViewU>& d,
const Interval<2> &I)
{
int ie = I[0].length();
int je = I[1].length();
for (int j=0; j<je; ++j)
for (int i=0; i<ie; ++i)
a(i,j) = b(i,j)+c(i,j)+d(i,j);
}
and
void arrayAssignManualCopy(const Array<2, double, BrickViewU>& a_,
const Array<2, double, BrickViewU>& b_,
const Array<2, double, BrickViewU>& c_,
const Array<2, double, BrickViewU>& d_,
const Interval<2> &I)
{
Array<2, double, BrickViewU> a(a_), b(b_), c(c_), d(d_);
int ie = I[0].length();
int je = I[1].length();
for (int j=0; j<je; ++j)
for (int i=0; i<ie; ++i)
a(i,j) = b(i,j)+c(i,j)+d(i,j);
}
get optimized vastly different. While the first one gets an
inner loop with
.L71:
fldl (%ebx) #* ivtmp.627
faddl (%esi) #* ivtmp.623
faddl (%ecx) #* ivtmp.629
fstpl (%eax) #* ivtmp.631
addl $1, %edx #, i
addl $8, %esi #, ivtmp.623
addl $8, %ebx #, ivtmp.627
addl $8, %ecx #, ivtmp.629
addl $8, %eax #, ivtmp.631
cmpl %edx, -20(%ebp) # i, D.162565
jg .L71 #,
i.e. nice - the second one gets optimized to
.L320:
leal (%ebx,%edi), %eax #,
movl %eax, -364(%ebp) #,
movl -408(%ebp), %ecx #,
leal (%ebx,%ecx), %edx #, tmp136
movl -404(%ebp), %ecx #,
leal (%ebx,%ecx), %eax #, tmp140
movl -388(%ebp), %ecx #,
fldl (%ecx,%eax,8) #
movl -380(%ebp), %eax #,
faddl (%eax,%edx,8) #
movl -400(%ebp), %edx #,
leal (%ebx,%edx), %eax #, tmp147
movl -396(%ebp), %ecx #,
faddl (%ecx,%eax,8) #
movl -364(%ebp), %eax #,
movl -372(%ebp), %edx #,
fstpl (%edx,%eax,8) #
movl %esi, %ebx # ivtmp.889, i
leal 1(%esi), %esi #, ivtmp.889
cmpl %ebx, -360(%ebp) # i, D.167717
jg .L320 #,
using recent 4.0 with -O2 -funroll-loops -ffast-math --param
max-unroll-times=1
The difference seems to be that in the ivopts dump file for
the second case with the object copies, the scalar evolution is
not know for whatever reason.
Note that the 3.4 loop optimizer does not have problems with local copies
like this.
Any hints on where to look at the actual reason of the failing scev?
(No, trying to produce a simple C testcase did not work)
I placed the ivopts dump file at
http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/perf.cpp.t54.ivopts
Thanks for any hints!
Richard.
--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/