This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: optimizations


Håkan Hjort <hakan@safelogic.se> writes:

> For Sun's Forte compiler one gets the following:
>
> main:
>          save    %sp,-104,%sp
>          or      %g0,16,%g1
>          st      %g1,[%fp-4]
>          add     %fp,-4,%o1
>          or      %g0,1,%o0
>          call    write   ! params =  %o0 %o1 %o2 ! Result
>          or      %g0,1,%o2
>          ret     ! Result =  %i0
>          restore %g0,0,%o0
>
> I.e. it just stores '16' in k before the call to write, no trace left
> of mm() or any loop, as should be.
>
> Perhaps GCC now does the same after hoisting both the load and the store?

Unfortunately not.  On x86, with -O2, 3.4 20030211 produces

main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $24, %esp
        movl    $0, -4(%ebp)
        andl    $-16, %esp
        jmp     .L2
        .p2align 4,,7
.L9:
        incl    %eax
        movl    %eax, -4(%ebp)
.L2:
        movl    -4(%ebp), %eax
        cmpl    $16, %eax
        jne     .L9
        movl    $1, 8(%esp)
        leal    -4(%ebp), %eax
        movl    %eax, 4(%esp)
        movl    $1, (%esp)
        call    write
        leave
        xorl    %eax, %eax
        ret

so you can see that not only is the loop still present, but the memory
write has not been sunk.  

What happens at -O2 -fssa -fssa-ccp -fssa-dce is interesting:

main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $24, %esp
        andl    $-16, %esp
        jmp     .L2
        .p2align 4,,7
.L9:
        incl    %eax
.L2:
        cmpl    $16, %eax
        jne     .L9
        movl    $1, 8(%esp)
        leal    -4(%ebp), %eax
        movl    %eax, 4(%esp)
        movl    $1, (%esp)
        call    write
        leave
        xorl    %eax, %eax
        ret

The unnecessary memory references are now gone, but the loop remains;
also you can see what may appear to be a bug at first glance -- %eax
is never initialized.  This is not actually a correctness bug: no
matter what value %eax happened to have before the loop, it will leave
the loop with the value 16.  However, I think you'll agree that this
is poor optimization.

RTL-SSA is, I believe, considered somewhat of a failed experiment -
the interesting work is happening on the tree-ssa branch.  I do not
have that branch checked out to experiment with.  Also, the loop
optimizer has been overhauled on the rtlopt branch, which again I do
not have to hand.

zw


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]