This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] PR11706, optimize std::pow(T, int)

From: Richard Guenther <rguenth at tat dot physik dot uni-tuebingen dot de>
To: Gabriel Dos Reis <gdr at integrable-solutions dot net>
Cc: Paolo Carlini <pcarlini at suse dot de>, <libstdc++ at gcc dot gnu dot org>
Date: Wed, 12 Jan 2005 17:02:05 +0100 (CET)
Subject: Re: [PATCH] PR11706, optimize std::pow(T, int)

On 12 Jan 2005, Gabriel Dos Reis wrote:

> Richard Guenther <rguenth@tat.physik.uni-tuebingen.de> writes:
>
> [...]
>
> | > And, after Zdenek nice patch, -funroll-loops should lead to optimal code.
> | > Does it work as expected?
> |
> | Yes, __cmath_power is inlined and -funroll-loops is able to optimize
> | the loop for constant powers.  Though the approach in __cmath_power
> | is not optimal, f.i. for a exponent of 27 we generate one more
> | multiplication than with __builtin_pow().  Also, as we inline
> | __cmath_power all the time now, we have icache and code-size regressions
> | for not constant powers.  With my hackish approach we could again
> | remove these inlines.  Also, having to enable -funroll-loops to
>
> So, what you want is to have the compiler better understand loop with
> "constant" bounds, without requiring -funroll-loops.

I also want that compiler to transform this unrolled std::pow(x, 27)
asm:

        fldl    8(%ebp)
        fld     %st(0)
        fmul    %st(1), %st
        popl    %ebp
        fld     %st(0)
        fmul    %st(1), %st
        fxch    %st(2)
        fmulp   %st, %st(1)
        fxch    %st(1)
        fmul    %st(0), %st
        fmul    %st, %st(1)
        fmul    %st(0), %st
        fmulp   %st, %st(1)

to that of std::pow(x, 27.0):

        fldl    8(%ebp)
        fld     %st(0)
        fmul    %st(1), %st
        popl    %ebp
        fmulp   %st, %st(1)
        fld     %st(0)
        fmul    %st(1), %st
        fmulp   %st, %st(1)
        fld     %st(0)
        fmul    %st(1), %st
        fmulp   %st, %st(1)

which has one multiplication less.  No surprise, gcc is able to
do this if std::pow(x, 27) is dispatched through __builtin_pow().

I do not understand why you reject a very simple solution to get
optimal and correct (sic! - read on) code for constant exponents.
Also, if I specify -funroll-loops, the compiler does funny things
with the precision of std::pow(x, 27), but with std::pow(x, 27.0)
only if specifying -ffast-math -- this is of course because of
the "funny" implementation of __cmath_power.  But again, it's
simple to do better.

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/

References:
- Re: [PATCH] PR11706, optimize std::pow(T, int)
  - From: Gabriel Dos Reis

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]