This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Optimize x*x*x*x*x*x using 3 multiplications.
- From: Roger Sayle <roger at eyesopen dot com>
- To: Richard Henderson <rth at redhat dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Wed, 30 Jul 2003 21:53:36 -0600 (MDT)
- Subject: Re: [PATCH] Optimize x*x*x*x*x*x using 3 multiplications.
On Wed, 30 Jul 2003, Richard Henderson wrote:
> On Tue, Jul 29, 2003 at 11:13:44AM -0600, Roger Sayle wrote:
> > The following patch allows gcc to optimize floating point expressions
> > such as x*x*x*x*x*x when compiled with -ffast-math to use only three
> > multiplications rather than five (as currently generated by mainline).
>
> I'd really wish we handled this properly via reassociation.
I'm not sure whether reassociation alone could work out that x^23
is better calculated as x^13*x^10 instead of x^22*x. However, if
you know of a paper or have suggestions of how reassociation should
be implemented, I can certainly look into it. I'm sure it could be
profitable for other operators.
I haven't checked but I'd guess GCC also handles "x+x+x+x+x+x" the same
way as my patch, i.e. by representing it internally by a multiplication.
> I don't see how you'll prevent "x*x*x" -> "pow(x,3)" ...
> from *not* calling the libm function if !flag_unsafe_math_optimizations.
It's probably not clear from the immediate context of the patch, but
the transformation to convert x*x into pow(x,2) is already nested inside
an "if (flag_unsafe_math_optimizations)" condition [fold-const.c:5944].
I could repeat the flag_unsafe_math_optimizations check redundantly
if you think that would make things clearer?
I agree there's an uncomfortable relationship between the conditions
we use to convert x*x into pow(x,2), and the conditions that are used
to check whether we expand things inline. They are consistent at the
moment, but at worst we may call libm's pow but only under -ffast-math
and when its guaranteed to be there. For example, it seems awkward
that we can't optimize x*x*x*x with -ffree-standing! Ultimately, my
suggestion of adding a __builtin_powi [or an equivalent EXP_EXPR tree
node] that is always expanded inline (into a loop if the second argument
isn't constant) would provide a better intermediate representation
internally.
I think the current patch provides a nice optimization with the
infrastructure we've already got. But looking forward, I'm very
interested in your opinions on a __builtin_powi and/or an EXP_EXPR.
Similarly, do you have a good name for __builtin_memeq? I hate
naming things.
Roger
--