This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] implement fxtract x87 instruction and logb, ilogb builtins


Roger Sayle wrote:

* config/i386/i386.md (*fxtractdf3, *fxtractsf3, *fxtractxf3): New
patterns to implement fxtract x87 instruction.
(logbdf2, logbsf2, logbxf2, ilogbsi2): New expanders to implement
logb, logbf, logbl, ilogb, ilogbf and ilogbl built-ins as inline x87
intrinsics.
(UNSPEC_XTRACT_FRACT, UNSPEC_XTRACT_EXP): New unspecs to represent
x87's fxtract insn.


The logb and ilogb functions are fairly rare.  It might make sense to
prioritize the more commonly used math functions.  For example, fmod
and asin are the two remaining libm functions that aren't implemented
as inline x87 intrinsics in the "almabench" benchmark program... :>



But these functions don't use fancy "new" x87 instructions... ;)

Regarding asin and acos instruction, I suggest to implement them as inline functions in mathinline.h, derived from atan2 and sqrt builtin:

--cut here--
inline double asin(double x)
{
 return atan2(x, sqrt(1.0 - x * x));
}

inline double acos(double x)
{
 return atan2(sqrt(1.0 - x * x), x);
}
--cut here


Regarding fmod/drem: These functions are implemented with fprem and fprem1, but these functions can reduce exponent by no more than 63. In mathinline.h, fmod is defined as loop:
__asm __volatile__ \
("1: fprem\n\t" \
"fnstsw %%ax\n\t" \
"sahf\n\t" \
"jp 1b" \
: "=t" (__value) : "0" (__x), "u" (__y) : "ax", "cc"); \


I don't know, how to model loop into inline intrinsic... FSTCW, FNSTSW, SAHF intrinsics are already implemented and FPREM{,1} should be no problem.

BTW:
I would suggest to use builtins as shortcuts to hardware accellerated functions (as fsin, fcos, atan2, fsqrt, ...), where we could gain something, otherwise we may implement the whole math library in *.md. For functions, that are not directly implemented in hardware, gcc should provide optimized system-dependent __FAST_MATH__mathinline.h header for libc, with derived high-speed functions. Regarding speed of inline functions - Resulting code should be as much optimized (or even more) as it would be, if these functions are implemented as x87 intrinsics. It is true, that we can't make optimizations on builtins level (i.e.: sin(asin(x)) => x), but IMHO these optimizations could be implemented in other places.


IMHO only a few FP intrinsics remain for x87 hardware:
(1) -rint(), lrint() and llrint() with frndint instruction [already implemented, but no builtin is linked to frndint insn yet]
(2) -fmod(), drem() with fprem{,1} instruction
(3) -ldexp() with fscale instruction [this is already implemented, but no builtin is linked to fscale insn yet]
(4) -frexp() with just implemented fxtract instruction [a problem with (int *exp) pointer, as in sincos instruction]
(5) -expm1(), log1p()


Uros.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]