[PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE instruction
Soumya AR
soumyaa@nvidia.com
Mon Sep 30 16:26:12 GMT 2024
This patch uses the FSCALE instruction provided by SVE to implement the
standard ldexp family of functions.
Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
following code:
float
test_ldexpf (float x, int i)
{
return __builtin_ldexpf (x, i);
}
double
test_ldexp (double x, int i)
{
return __builtin_ldexp(x, i);
}
GCC Output:
test_ldexpf:
b ldexpf
test_ldexp:
b ldexp
Since SVE has support for an FSCALE instruction, we can use this to process
scalar floats by moving them to a vector register and performing an fscale call,
similar to how LLVM tackles an ldexp builtin as well.
New Output:
test_ldexpf:
fmov s31, w0
ptrue p7.b, all
fscale z0.s, p7/m, z0.s, z31.s
ret
test_ldexp:
sxtw x0, w0
ptrue p7.b, all
fmov d31, x0
fscale z0.d, p7/m, z0.d, z31.d
ret
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md
(ldexp<mode>3): Added a new pattern to match ldexp calls with scalar
floating modes and expand to the existing pattern for FSCALE.
(@aarch64_pred_<optab><mode>): Extended the pattern to accept SVE
operands as well as scalar floating modes.
* config/aarch64/iterators.md:
SVE_FULL_F_SCALAR: Added an iterator to match all FP SVE modes as well
as SF and DF.
VPRED: Extended the attribute to handle GPF modes.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/fscale.c: New test.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch
Type: application/octet-stream
Size: 5918 bytes
Desc: 0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20240930/bfe8d161/attachment.obj>
More information about the Gcc-patches
mailing list