This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug middle-end/30201] gcc doesn't unroll nested loops

From: "bangerth at dealii dot org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: 14 Dec 2006 15:35:51 -0000
Subject: [Bug middle-end/30201] gcc doesn't unroll nested loops
References: <bug-30201-13749@http.gcc.gnu.org/bugzilla/>
Reply-to: gcc-bugzilla at gcc dot gnu dot org


------- Comment #8 from bangerth at dealii dot org  2006-12-14 15:35 -------
Here is an analysis of the assembler code we get when using
my first command line in my previous comment, i.e. no hand unrolling.
I'm using 4.1.0, btw.

The main loop looks like this:
--------------------------
.L2:
        pushl   %edx            // push 'factor'
        xorl    %eax, %eax      // eax=0
        fildl   (%esp)          // st(0)=(double)factor
        addl    $1, %edx        // ++factor
        fstl    data            // data[0]=factor
        movl    %eax, (%esp)    // push 0
        fildl   (%esp)          // st(0)=0
        addl    $4, %esp
        cmpl    $1000000000, %edx
        fstl    data+24         // data[3]=0
        fstl    data+48         // data[6]=0
        fstl    data+8          // data[1]=0
        fxch    %st(1)          // st(0)=factor
        fstl    data+32         // data[4]=factor
        fxch    %st(1)          // st(0)=0
        fstl    data+56         // data[7]=0
        fstl    data+16         // data[2]=0
        fstpl   data+40         // data[5]=0; st(0)=factor
        fstpl   data+64         // data[8]=factor
        jne     .L2
---------------------

I can find several things wrong with this:
a/ the sequence
    xorl        %eax, %eax
    movl        %eax, (%esp)
    fildl       (%esp)
   could certainly be made more efficient by using fldz.
b/ I find the use of fstpl at the end of the loop quite ingenious, since
   it avoids another fxch. However, the two uses of fxch in the middle
   may nevertheless be avoided if we manage to realize that we can
   reorder all those stores. 

So, in summary, it is not that gcc doesn't realize that it can unroll
these loops -- it actually does that, the slowdown comes from other places.

W.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30201

References:
- [Bug c++/30201] New: gcc doesn't unroll nested loops
  - From: jacob at math dot jussieu dot fr

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]