This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Polyhedron performance regression


On 11/13/06, Richard Guenther <richard.guenther@gmail.com> wrote:
On 11/13/06, Richard Guenther <richard.guenther@gmail.com> wrote:
> On 11/13/06, Paul Richard Thomas <paul.richard.thomas@gmail.com> wrote:
> > Richard,
> >
> > > I bet we are missing an inline x86_64 variant.  But it should be not too
> > > hard to implement.
> >
> > 'tis what I thought.
> >
> > My view is that Steve is right about right and slow; however it is a
> > very visible right and slow.  Which is the right button to press to
> > have the inline version crafted for us?
>
> I can do expanders for fmod and remainder.

I sort of take that back ;)  But at least for the fortran MODULO you should
be able to emit A - FLOOR (A / P) * P as in the definition.  Currently
it seems at least FMOD is called in the expansion.  With the above
definition used for expansion you'd get

        cvtss2sd        %xmm0, %xmm3
        movsd   (%rsi), %xmm2
        movapd  %xmm3, %xmm0
        divsd   %xmm2, %xmm0
        cvttsd2si       %xmm0, %eax
        cvtsi2sd        %eax, %xmm1
        ucomisd %xmm1, %xmm0
        jae     .L2
        subl    $1, %eax
        cvtsi2sd        %eax, %xmm1
.L2:
        mulsd   %xmm2, %xmm1
        movapd  %xmm3, %xmm0
        subsd   %xmm1, %xmm0

exact IEC 60559 semantics for remainder will probably not be suitable
for inlining.  But I can try...

A quick shot for remainder with not honoring signed zeros would be


remainder (x, y) = x - lrint (x / y) * y

precision is of course an issue, but with cvtsd2siq available we can cover
more than the 52 bits of double precision in the integer part so we only
have intermediate rounding issues left.  So a -fflag-unsafe-math-optimizations
expansion which would yield

       movapd  %xmm0, %xmm2
       divsd   %xmm1, %xmm2
       cvtsd2siq       %xmm2, %rax
       cvtsi2sdq       %rax, %xmm2
       mulsd   %xmm1, %xmm2
       subsd   %xmm2, %xmm0

modulo overflow handling (which is a comparison against 0x8000000,
a jump and loading zero as result).

Anyone spots a serious flaw here?

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]