This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Re: [PATCH] PR11706, optimize std::pow(T, int)
- From: Richard Guenther <rguenth at tat dot physik dot uni-tuebingen dot de>
- To: Gabriel Dos Reis <gdr at integrable-solutions dot net>
- Cc: Paolo Carlini <pcarlini at suse dot de>, <libstdc++ at gcc dot gnu dot org>
- Date: Wed, 12 Jan 2005 17:02:05 +0100 (CET)
- Subject: Re: [PATCH] PR11706, optimize std::pow(T, int)
On 12 Jan 2005, Gabriel Dos Reis wrote:
> Richard Guenther <rguenth@tat.physik.uni-tuebingen.de> writes:
>
> [...]
>
> | > And, after Zdenek nice patch, -funroll-loops should lead to optimal code.
> | > Does it work as expected?
> |
> | Yes, __cmath_power is inlined and -funroll-loops is able to optimize
> | the loop for constant powers. Though the approach in __cmath_power
> | is not optimal, f.i. for a exponent of 27 we generate one more
> | multiplication than with __builtin_pow(). Also, as we inline
> | __cmath_power all the time now, we have icache and code-size regressions
> | for not constant powers. With my hackish approach we could again
> | remove these inlines. Also, having to enable -funroll-loops to
>
> So, what you want is to have the compiler better understand loop with
> "constant" bounds, without requiring -funroll-loops.
I also want that compiler to transform this unrolled std::pow(x, 27)
asm:
fldl 8(%ebp)
fld %st(0)
fmul %st(1), %st
popl %ebp
fld %st(0)
fmul %st(1), %st
fxch %st(2)
fmulp %st, %st(1)
fxch %st(1)
fmul %st(0), %st
fmul %st, %st(1)
fmul %st(0), %st
fmulp %st, %st(1)
to that of std::pow(x, 27.0):
fldl 8(%ebp)
fld %st(0)
fmul %st(1), %st
popl %ebp
fmulp %st, %st(1)
fld %st(0)
fmul %st(1), %st
fmulp %st, %st(1)
fld %st(0)
fmul %st(1), %st
fmulp %st, %st(1)
which has one multiplication less. No surprise, gcc is able to
do this if std::pow(x, 27) is dispatched through __builtin_pow().
I do not understand why you reject a very simple solution to get
optimal and correct (sic! - read on) code for constant exponents.
Also, if I specify -funroll-loops, the compiler does funny things
with the precision of std::pow(x, 27), but with std::pow(x, 27.0)
only if specifying -ffast-math -- this is of course because of
the "funny" implementation of __cmath_power. But again, it's
simple to do better.
Richard.
--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/