This is the mail archive of the
`gcc@gcc.gnu.org`
mailing list for the GCC project.

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |

Other format: | [Raw text] |

*From*: "Timothy J. Wood" <tjw at omnigroup dot com>*To*: Andreas Svrcek-Seiler <svrci at tbi dot univie dot ac dot at>*Cc*: <gcc at gcc dot gnu dot org>*Date*: Mon, 19 Nov 2001 10:23:08 -0800*Subject*: Re: GCC and fast eveluaton of polynomials

This formulation of the series approximation to log() completely serializes the execution of your code (each step cannot be computed until the previous has finished due to dependence on 'v'). Instead, you can compute two 'halves' of the sum in parallel. First, rewrite your expression to be the sum of one polynomial with only even powers and one with only odd powers. Then, factor a power out of the odd power side (since its lowest power is 1), leaving you with two independent expressions in dealing even powers. Finally, start your two partial sums, one with the zero power one with the coefficient of what was the 1st power before factoring above happened and then accumulate in the rest of the terms in parallel. An example might help here, say we have: y = a7*x^7 ... + a1*x + a0 Rewrite this as: y = yEven + yOdd yEven = a6*x^6 + a4*x^4 ... + a0 yOdd = a7*x^7 + a5*x^5 ... + a1*x Factor yOdd, leaving: y = yEven + x * yOdd; yEven = a6*x^6 + a4*x^4 ... + a0 yOdd = a7*x^6 + a5*x^4 ... + a1 Then, compute yEven and yOdd using Horner's scheme in x^2. One you have to two halves, the final sum is just another multiply-accumulate operation. The advantage to this approach is that you free up the compiler to overlap the two independent instruction streams (and of course we are hoping the compiler is smart enough to do this on CPUs where multiple instructions can be in flight). The only point of synchronization is the very ending bit for the final sum. This approach could be extended to a x^{3,4,...} scheme if you had a lot of terms and you had a CPU that would have FPU stalls with just the x^2 version above. At that point though, you should probably be using some sort of vector unit :) Bringing this back to something relevant to the this list, it would be nice for the compiler to be able to do this in the -ffast-math case, but I'm not 100% familiar with all the precision loss issues so please don't flame me (duck!) -tim On Monday, November 19, 2001, at 04:48 AM, Andreas Svrcek-Seiler wrote: > Dear developers, > > I,ve got an inner loop > inside which I evaluate a polynomial > If I do it naively according to horner's scheme: > (replaciing a log() call, that's even more time-consuming) > u2 = u*u; > v = (((((B5*u2+B4)*u2+B3)*u2+B2)*u2+B1)*u2+B0)*u; > > the coompiled executable is significantly slower than > when the source reads > v = B5*u2 + B4; > v = v*u2 + B3; > v = v*u2 + B2; > v = v*u2 + B1; > v = v*u2 + B0; > v *= u; > > this beavior is independent of how much optimizatons I turn on. > by the way,the newly released intel compiler produces even slower > code ;-) > > ... maybe there's room for easy improvement ? > > best regards > andreas > -- > > ))))) > ((((( > ( O O ) > -------oOOO--(_)--OOOo----------------------------------------------------- > o Wolfgang Andreas Svrcek-Seiler > o (godzilla) > svrci@tbi.univie.ac.at > .oooO Tel.:01-4277-52733 > ( ) Oooo. > -------\ (----( > )-------------------------------------------------------- > \_) ) / > (_/ >

**References**:**GCC and fast eveluaton of polynomials***From:*Andreas Svrcek-Seiler

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |