This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: optimizations
- From: Zack Weinberg <zack at codesourcery dot com>
- To: Håkan Hjort <hakan at safelogic dot se>
- Cc: Reza Roboubi <reza at linisoft dot com>, gcc at gcc dot gnu dot org
- Date: Tue, 18 Feb 2003 10:13:54 -0800
- Subject: Re: optimizations
- References: <Pine.LNX.4.21.0301151338040.30377-100000@mail.kloo.net><3E25F2BB.BA90B2C9@linisoft.com> <20030218175524.GA8638@safelogic.se>
Håkan Hjort <hakan@safelogic.se> writes:
> For Sun's Forte compiler one gets the following:
>
> main:
> save %sp,-104,%sp
> or %g0,16,%g1
> st %g1,[%fp-4]
> add %fp,-4,%o1
> or %g0,1,%o0
> call write ! params = %o0 %o1 %o2 ! Result
> or %g0,1,%o2
> ret ! Result = %i0
> restore %g0,0,%o0
>
> I.e. it just stores '16' in k before the call to write, no trace left
> of mm() or any loop, as should be.
>
> Perhaps GCC now does the same after hoisting both the load and the store?
Unfortunately not. On x86, with -O2, 3.4 20030211 produces
main:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $0, -4(%ebp)
andl $-16, %esp
jmp .L2
.p2align 4,,7
.L9:
incl %eax
movl %eax, -4(%ebp)
.L2:
movl -4(%ebp), %eax
cmpl $16, %eax
jne .L9
movl $1, 8(%esp)
leal -4(%ebp), %eax
movl %eax, 4(%esp)
movl $1, (%esp)
call write
leave
xorl %eax, %eax
ret
so you can see that not only is the loop still present, but the memory
write has not been sunk.
What happens at -O2 -fssa -fssa-ccp -fssa-dce is interesting:
main:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
andl $-16, %esp
jmp .L2
.p2align 4,,7
.L9:
incl %eax
.L2:
cmpl $16, %eax
jne .L9
movl $1, 8(%esp)
leal -4(%ebp), %eax
movl %eax, 4(%esp)
movl $1, (%esp)
call write
leave
xorl %eax, %eax
ret
The unnecessary memory references are now gone, but the loop remains;
also you can see what may appear to be a bug at first glance -- %eax
is never initialized. This is not actually a correctness bug: no
matter what value %eax happened to have before the loop, it will leave
the loop with the value 16. However, I think you'll agree that this
is poor optimization.
RTL-SSA is, I believe, considered somewhat of a failed experiment -
the interesting work is happening on the tree-ssa branch. I do not
have that branch checked out to experiment with. Also, the loop
optimizer has been overhauled on the rtlopt branch, which again I do
not have to hand.
zw