This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Some small optimization issues with gcc 4.0 20050418


Sebastian Biallas wrote:
But I noticed some smaller optimization issues on x86, and on of them is
a regression to gcc 3.3 so I'm reporting this here. Accept my apologies
if this is already known, but I think it's worth noting.

You can submit optimization regressions into our bugzilla bug database. gcc-4 has a bunch of new and/or rewritten optimization passes, and occasionally minor problems with them will be missed. They are likely to be fixed if we get bug reports for them though.


[1] Why keep the -1 constant in %esi? The cmpl with constant is only 1
byte longer.. this doesn't justify this.

Looks like one of new tree optimization passes, ivopts, emitted a compare with the constant first, which is non-canonical, and prevented the RTL cse pass from substituting the -1 into the compare.


This is already fixed on mainline. ivopts now emits a canonical compare, and also includes the constant -1 in the compare instead of putting it in a temporary.

[2] It's allocating 5 words on stack while 2 would be enough. I know
that gcc isn't very smart at optimizing the stack slots but this is a
regression

There is one word for the return address, two words for registers being saved, and two words for the printf arguments.


There does appear to be a problem here, as we are using pushes in the prologue to save registers, which means we should not be allocating space for them when we decrement the stack pointer. The other 3 slots appear to be necessary.

[3] Why use the cmpl at all? gcc 3.3 did this right, I don't think the
cmpl is faster than a decl (and even then, the cmpl could be replaced by
a "subl $1, %ebx")

This looks like another ivopts issue. If gcc-3.3, we get a >= branch, which can use the result of the decrement. In gcc-4.0, ivopts canonicalizes the branch to use !=, which can not use the result of the decrement as the condition code flags are set wrong for that.


This still happens on mainline, and should probably be looked into.

[1] Again, the wasted stack. gcc-3.3 doesn't get this right, too.

I don't believe so. We have the return address and the two printf arguments, so all 3 slots are needed.


[2] Even a peephole optimizer could optimize this :)

Yes, this is embarassing. I had to use -march=i686 to reproduce this.


We have a peephole2 pattern that converts
movl $10, i
into
movl $10, %eax
movl %eax, i
because it is faster, except that this happens so late that there is no chance to perform cse on the result, so we can't delete the duplicate constant immediate loads. So while this is bad, it isn't as bad as it might appear at first.


[3] The testl is unneeded, the flags are already prepared by the decl.
Is this a hard optimization to accomplish? It's quite obvious for a
human, but I don't know how this looks from a compiler perspective...

This is same as above, we need the testl as we have the wrong kind of branch condition.
--
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]