[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

bonzini at gnu dot org gcc-bugzilla@gcc.gnu.org
Mon Aug 7 06:19:00 GMT 2006



------- Comment #37 from bonzini at gnu dot org  2006-08-07 06:19 -------
I don't see how the last fmul[sl] can be removed without increasing code size. 
The only way to fix it would be to change the machine description to say that
"this processor does not like FP operations with a memory operand".  With a
peephole, this is as good as we can get it.  The last fmul is not coupled with
a "fld %st" because it consumes the stack entry.  See in comment #30, where
there is still a "fmull b".

Can you please try re-running the tests?  It takes skill^W^W seems quite weird
to have a 100x slow-down, also because my tests were run on a similar Prescott
(P4e).

It also would be interesting to re-run your code generator on a compiler built
from svn trunk.  If it can provide higher performance, you'd be satisfied I
guess even if it comes from a different kernel.  Also, I strongly believe that
you should implement vectorization, or at least find out *why* GCC does not
vectorize your code.  It may be simply that it does not have any guarantee on
the alignment.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827



More information about the Gcc-bugs mailing list