This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: SH optimized software floating point routines


Joern Rennecke wrote:
Quoting Christian Bruel <christian.bruel@st.com>:

Using the ieee-sf.S + this patch
OK

Is this only a proof-of-concept, because you only change the ne[sd]f2 implementation?

I changed also the unordered comparison patterns. (cmpunsf_i1, cmpundf_i1). But yes, the other functions that would need the same kind of check would be unordsf2, and all the comparisons (gtsf2, gesf2f...) for floats and doubles.
But I will only consider those after/if we all agree that this needs to be done instead of keeping the current QNaN only restrictions.


And you go out of your way to only accept a restricted
set of values.

This hold for the original optimized implementation as well, for example I don't think that 0x7f800001 was caught. In fact implementing correctly the isnan check without restricted set of value makes the original discussion pointless, since the Q/S bits are a subpart of all possible codings, with any fractional part != 0.


Plus, the overuse of the arithmetic unit hurts SH4-100 /
SH4-200 instruction pairing.
>
AFAICT you need only one cycle penalty, in the check_nan path:

GLOBAL(nesf2):
         /* If the raw values are unequal, the result is unequal, unless
            both values are +-zero.
            If the raw values are equal, the result is equal, unless
            the values are NaN.  */
         cmp/eq  r4,r5
         mov.l   LOCAL(inf2),r1
         bt/s     LOCAL(check_nan)
         mov     r4,r0
         or      r5,r0
         rts
         add     r0,r0
LOCAL(check_nan):
         add     r0,r0
         cmp/hi  r1,r0
         rts
         movt    r0
         .balign 4
LOCAL(inf2):
         .long 0xff000000

You could even save four bytes by putting the check_nan label into the
delay slot, but I'm not sure if that'll discomfit any branch prediction mechanism.

Thanks a lot of this one, It should fix the original problem on the restricted set of values as well. The cmpund patterns fix should probably have a similar checks.



Disclaimer: I've not tested this code.


For the DFmode case, what about NaNs denoted by the low word, e.g.
0x7ff00000 000000001 ?

If so, the DFmode code could become something like this:

GLOBAL(nedf2):
         cmp/eq  DBL0L,DBL1L
         mov.l   LOCAL(inf2),r1
         bf LOCAL(ne)
         cmp/eq  DBL0H,DBL1H
         bt/s    LOCAL(check_nan)
         mov     DBL0H,r0
         or      DBL1H,r0

         add     r0,r0
         rts
         or      DBL0L,r0
LOCAL(check_nan):
         tst     DBL0L,DBL0L
         add     r0,r0
         subc    r1,r0
         mov     #-1,r0
         rts
         negc    r0,r0
LOCAL(ne):
         rts
         mov #1,r0
         .balign 4
LOCAL(inf2):
         .long 0xffe00000

> For an actual patch, you need to use the SL* macros from > config/sh/lib1funcs.h because the SH1 does not have delayed branches.

OK, thanks


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]