This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RFC: Creating a more efficient sincos interface


Hi,

The existing sincos functions use 2 pointers to return the sine and cosine result. In
most cases 4 memory accesses are necessary per call. This is inefficient and often
significantly slower than returning values in registers. I ran a few experiments on the
new optimized sincosf implementation in GLIBC using the following interface:

__complex__ float sincosf2 (float);

This has 50% higher throughput and a 25% reduction in latency on Cortex-A72 for
random inputs in the range +-PI/4. Larger inputs take longer and thus have lower
gains, but there is still a 5% gain on the (rarely used) path with full range reduction.
Given sincos is used in various HPC applications this can give a worthwile speedup.

LLVM already supports something similar for OSX using a struct of 2 floats.
Using complex float is better since not all targets may support returning structures in
floating point registers and GCC generates very inefficient code on targets that do
(PR86145).

What do people think? Ideally I'd like to support this in a generic way so all targets can
benefit, but it's also feasible to enable it on a per-target basis. Also since not all libraries
will support the new interface, there would have to be a flag or configure option to switch
the new interface off if not supported (maybe automatically based on the math.h header).

Wilco

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]