[PATCH] implement fmod() as built-in x87 intrinsic

Uros Bizjak uros@kss-loka.si
Tue May 4 13:02:00 GMT 2004


Attached to this message, please find a patch, that implements fmod() as 
built-in x87 intrinsic.

Patch is tested on i686-pc-linux-gnu:
- bootstrapped gcc
- compilation with attached builtins-40.c test
- compared output of fmod() fmodf(), fmodl() with -O2, with and without 
- built almabench ( ~1 second faster: user    0m14.328s)

2004-05-04  Uros Bizjak  <uros@kss-loka.si>

    * optabs.h (enum optab_index): Add new OTI_fmod.
    (fmod_optab): Define corresponding macro.
    * optabs.c (init_optabs): Initialize fmod_optab.
    * genopinit.c (optabs): Implement fmod_optab using fmod?f3
    * builtins.c (expand_builtin_mathfn_2): Handle BUILT_IN_FMOD{,F,L}
    using fmod_optab.
    (expand_builtin): Expand BUILT_IN_FMOD{,F,L} using
    expand_builtin_mathfn_2 if flag_unsafe_math_optimizations is set.

    * reg-stack.c (subst_stack_regs_pat): Handle UNSPEC_FPREM.

    * config/i386/i386.md (UNSPEC_FPREM): New unspec to represent x87's
    fprem insn.
    (fpremxf_1): New pattern to implement fprem x87 instruction.
    (fmodsf3, fmoddf3, fmodxf3): New expanders to implement fmodf, fmod
    and fmodl built-ins as inline x87 intrinsics.


    * testsuite/gcc.dg/builtins-40: New test.

It looks that there are some problems with gcc's reg-stack and loop 
optimization. The RTL that patch generates is OK, but asm code, produced 
by gcc is not optimal, because gcc does not know that two fxchs cancels 
each other. And optimally, fxch on input operators could be 
"implemented" by changing operator loading order... This effect could be 
observed in almabench, around fprem instruction in asm dumps [look at 
???.c.35.stack dump for further analysis]. I guess that because my patch 
only shows gcc's weaknes,  it is still OK to commit it to mainline CVS.

        fldl 4(%esp)
        fldl 12(%esp)
        jmp  .L9
        .p2align 4,,7
        fxch %st(1)    <- this should be moved out of loop
        fxch %st(1)    <- this should be moved out of loop
        fnstsw  %ax
        jp   .L13
        fstp %st(1)

BTW: drem() could be implemented the same way, just fprem should be 
substituted with fprem1. This will be in followup patch.


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fmod.diff
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20040504/b6a8e0b0/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: builtins-40.c
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20040504/b6a8e0b0/attachment.c>

More information about the Gcc-patches mailing list