[PATCH][PING] Some pending middle-end patches

Roger Sayle roger@eyesopen.com
Sat Nov 11 19:00:00 GMT 2006


Hi Richard,

On Sat, 11 Nov 2006, Richard Guenther wrote:
> [PATCH] Fix PR25620, pow() expansion missed-optimization
> http://gcc.gnu.org/ml/gcc-patches/2006-11/msg00216.html

Sorry for the delay, but I think this can be implemented better.

Consider the FORTRAN example X**1.5 described in the PR.
With you'r patch we'll now generate the sequence.

	tmp = sqrt(x);
	return tmp*tmp*tmp;

But notice that the PGF90 compiler that's reportedly 10 times
faster than gfortran (in comment #9) expands the sequence:

	tmp = sqrt(x);
	return x*tmp;

which is still better, as it only requires one multiplication
and has less rounding.  I do like the idea of your patch.
One other issue is that whilst I approve of the factoring of
the common bits of expand_builtin_pow and expand_builtin_powi,
I suspect that you've factored too much, and the processing
of REAL_CST exponents at the top of expand_constant_power,
should be hosted back into build_builtin_pow.  All of the
logic is only shared once we've an integer exponent, which is
always the case for powi.

I wonder if you can do some timings, for things like pow(x,5.0/3.0)
vs. cbrt(powi(x,5))?  I especially cautious of you comments about
forcing c99 functions in gfortran front-end.  A fall-back in
libgfortran's instrinsics/c99_functions.c is very likely to be
implemented as pow(x,1.0/3.0), which means we'll ultimately be
transforming one call to pow, into a call to pow and three additional
multiplications :-(



> [PATCH] Fix PR29719, ICE on expanding __builtin_lfloor/ceil
> http://gcc.gnu.org/ml/gcc-patches/2006-11/msg00269.html
> (don't know if this is really the way to go)


I'm not sure this is the correct way to go either.  Firstly, the
normal rule is that it is only safe to call __builtin_foo when
foo is available on the system.  For example, when not optimizing
we'll always call the function foo!  However, as an exception it
might be reasonable to always inline something for lfloor and
lceil.  However, I'm surprised at your choice of implementation,
using:

	tmp = (int)x;
	return ((double)x > x ? tmp-1 : tmp;

instead of the alternative:

	return (int)floor(x);

On some platforms, one may be faster than the other.  Admittedly,
x87's floor implementation is ugly enough to prefer the first,
but with hardware support or software floating point the second
may be preferable.

My immediate preference would be to correct the middle-end code
that makes assumptions about __builtin_foo when used by the user,
and ends up calling "foo" (i.e. a linker error), as we do with
other builtins.  We certainly shouldn't ICE.


> Also this one from Andrew, PR28545:
> [4.1 Regression] Wrong code for hoisted multiplication
> http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00768.html

This one is OK for the 4.1 branch.

Roger
--



More information about the Gcc-patches mailing list