This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Some small optimization issues with gcc 4.0 20050418
- From: James E Wilson <wilson at specifixinc dot com>
- To: Sebastian Biallas <sb at biallas dot net>
- Cc: gcc at gcc dot gnu dot org
- Date: Fri, 22 Apr 2005 13:04:49 -0700
- Subject: Re: Some small optimization issues with gcc 4.0 20050418
- References: <4267B9FD.6090900@biallas.net>
Sebastian Biallas wrote:
But I noticed some smaller optimization issues on x86, and on of them is
a regression to gcc 3.3 so I'm reporting this here. Accept my apologies
if this is already known, but I think it's worth noting.
You can submit optimization regressions into our bugzilla bug database.
gcc-4 has a bunch of new and/or rewritten optimization passes, and
occasionally minor problems with them will be missed. They are likely
to be fixed if we get bug reports for them though.
[1] Why keep the -1 constant in %esi? The cmpl with constant is only 1
byte longer.. this doesn't justify this.
Looks like one of new tree optimization passes, ivopts, emitted a
compare with the constant first, which is non-canonical, and prevented
the RTL cse pass from substituting the -1 into the compare.
This is already fixed on mainline. ivopts now emits a canonical
compare, and also includes the constant -1 in the compare instead of
putting it in a temporary.
[2] It's allocating 5 words on stack while 2 would be enough. I know
that gcc isn't very smart at optimizing the stack slots but this is a
regression
There is one word for the return address, two words for registers being
saved, and two words for the printf arguments.
There does appear to be a problem here, as we are using pushes in the
prologue to save registers, which means we should not be allocating
space for them when we decrement the stack pointer. The other 3 slots
appear to be necessary.
[3] Why use the cmpl at all? gcc 3.3 did this right, I don't think the
cmpl is faster than a decl (and even then, the cmpl could be replaced by
a "subl $1, %ebx")
This looks like another ivopts issue. If gcc-3.3, we get a >= branch,
which can use the result of the decrement. In gcc-4.0, ivopts
canonicalizes the branch to use !=, which can not use the result of the
decrement as the condition code flags are set wrong for that.
This still happens on mainline, and should probably be looked into.
[1] Again, the wasted stack. gcc-3.3 doesn't get this right, too.
I don't believe so. We have the return address and the two printf
arguments, so all 3 slots are needed.
[2] Even a peephole optimizer could optimize this :)
Yes, this is embarassing. I had to use -march=i686 to reproduce this.
We have a peephole2 pattern that converts
movl $10, i
into
movl $10, %eax
movl %eax, i
because it is faster, except that this happens so late that there is no
chance to perform cse on the result, so we can't delete the duplicate
constant immediate loads. So while this is bad, it isn't as bad as it
might appear at first.
[3] The testl is unneeded, the flags are already prepared by the decl.
Is this a hard optimization to accomplish? It's quite obvious for a
human, but I don't know how this looks from a compiler perspective...
This is same as above, we need the testl as we have the wrong kind of
branch condition.
--
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com