Optimize C / R.

Fri Jul 19 05:02:00 GMT 2002

Jan Hubicka wrote:

> > Hi Toon and Jan,

> > I've now looked even deeper at the problem.  It turns out that CSE
> > indeed prefers the CONST_DOUBLE 0.0 giving it a src_folded_cost of 2,
> > and the original MEM a src_cost of 8.  As a result CSE attempts the above
> > substitution, and calls validate_change on the resulting instruction:
> >
> > (set (reg:SF 64) (const_double:SF 0 [0x0] 0 [0x0] 0 [0x0]))

> > Jan, do you know why the x86 backend "mov?f" patterns don't match
> > the above RTL when called from CSE?  If the pattern was recognized
> 
> The problem was that loading zero to ST register requires one fldz
> operation, possibly several fxch operations and later fstp operation.
> This is usually slower than single FP operation having memory operand to
> constant pool containing zero.
> 
> What we do is attempting to force all immediates into registers before
> reload, so combining zero and arithmetic operation results in
> read-modify FP operations and we use post reload splitter to
> rematerialize fldz when it remained uncombined.
> 
> We may want to re-benchmark it or figure out better sollution...

This analysis is also probably different if you compile with
-march=pentium4 -mfpmath=sse, but that (essentially) still gives the same
code.

-- 
Toon Moene - mailto:toon@moene.indiv.nluug.nl - phoneto: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
Join GNU Fortran 95: http://g95.sourceforge.net/ (under construction)