This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug middle-end/30201] gcc doesn't unroll nested loops
- From: "bangerth at dealii dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 14 Dec 2006 15:35:51 -0000
- Subject: [Bug middle-end/30201] gcc doesn't unroll nested loops
- References: <bug-30201-13749@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #8 from bangerth at dealii dot org 2006-12-14 15:35 -------
Here is an analysis of the assembler code we get when using
my first command line in my previous comment, i.e. no hand unrolling.
I'm using 4.1.0, btw.
The main loop looks like this:
--------------------------
.L2:
pushl %edx // push 'factor'
xorl %eax, %eax // eax=0
fildl (%esp) // st(0)=(double)factor
addl $1, %edx // ++factor
fstl data // data[0]=factor
movl %eax, (%esp) // push 0
fildl (%esp) // st(0)=0
addl $4, %esp
cmpl $1000000000, %edx
fstl data+24 // data[3]=0
fstl data+48 // data[6]=0
fstl data+8 // data[1]=0
fxch %st(1) // st(0)=factor
fstl data+32 // data[4]=factor
fxch %st(1) // st(0)=0
fstl data+56 // data[7]=0
fstl data+16 // data[2]=0
fstpl data+40 // data[5]=0; st(0)=factor
fstpl data+64 // data[8]=factor
jne .L2
---------------------
I can find several things wrong with this:
a/ the sequence
xorl %eax, %eax
movl %eax, (%esp)
fildl (%esp)
could certainly be made more efficient by using fldz.
b/ I find the use of fstpl at the end of the loop quite ingenious, since
it avoids another fxch. However, the two uses of fxch in the middle
may nevertheless be avoided if we manage to realize that we can
reorder all those stores.
So, in summary, it is not that gcc doesn't realize that it can unroll
these loops -- it actually does that, the slowdown comes from other places.
W.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30201