[PATCH] Optimise the fpclassify builtin to perform integer operations when possible
Tamar Christina
Tamar.Christina@arm.com
Mon Sep 12 16:21:00 GMT 2016
Hi All,
This patch adds an optimized route to the fpclassify builtin
for floating point numbers which are similar to IEEE-754 in format.
The goal is to make it faster by:
1. Trying to determine the most common case first
(e.g. the float is a Normal number) and then the
rest. The amount of code generated at -O2 are
about the same +/- 1 instruction, but the code
is much better.
2. Using integer operation in the optimized path.
At a high level, the optimized path uses integer operations
to perform the following:
if (exponent bits aren't all set or unset)
return Normal;
else if (no bits are set on the number after masking out
sign bits then)
return Zero;
else if (exponent has no bits set)
return Subnormal;
else if (mantissa has no bits set)
return Infinite;
else
return NaN;
In case the optimization can't be applied the old
implementation is used as a fall-back.
A limitation with this new approach is that the exponent
of the floating point has to fit in 31 bits and the floating
point has to have an IEEE like format and values for NaN and INF
(e.g. for NaN and INF all bits of the exp must be set).
To determine this IEEE likeness a new boolean was added to real_format.
Regression tests ran on aarch64-none-linux and arm-none-linux-gnueabi
and no regression. x86 uses it's own implementation other than
the fpclassify builtin.
As an example, Aarch64 now generates for classification of doubles:
f:
fmov x1, d0
mov w0, 7
sbfx x2, x1, 52, 11
add w3, w2, 1
tst w3, 0x07FE
bne .L1
mov w0, 13
tst x1, 0x7fffffffffffffff
beq .L1
mov w0, 11
tbz x2, 0, .L1
tst x1, 0xfffffffffffff
mov w0, 3
mov w1, 5
csel w0, w0, w1, ne
.L1:
ret
No new tests as there are existing tests to test functionality.
glibc benchmarks ran against the builtin and this shows a 31.3%
performance gain.
Ok for trunk?
Thanks,
Tamar
PS. I don't have commit rights so if OK can someone apply the patch for me.
gcc/
2016-08-25 Tamar Christina <tamar.christina@arm.com>
Wilco Dijkstra <wilco.dijkstra@arm.com>
* gcc/builtins.c (fold_builtin_fpclassify): Added optimized version.
* gcc/real.h (real_format): Added is_ieee_compatible field.
* gcc/real.c (ieee_single_format): Set is_ieee_compatible flag.
(mips_single_format): Likewise.
(motorola_single_format): Likewise.
(spu_single_format): Likewise.
(ieee_double_format): Likewise.
(mips_double_format): Likewise.
(motorola_double_format): Likewise.
(ieee_extended_motorola_format): Likewise.
(ieee_extended_intel_128_format): Likewise.
(ieee_extended_intel_96_round_53_format): Likewise.
(ibm_extended_format): Likewise.
(mips_extended_format): Likewise.
(ieee_quad_format): Likewise.
(mips_quad_format): Likewise.
(vax_f_format): Likewise.
(vax_d_format): Likewise.
(vax_g_format): Likewise.
(decimal_single_format): Likewise.
(decimal_quad_format): Likewise.
(iee_half_format): Likewise.
(mips_single_format): Likewise.
(arm_half_format): Likewise.
(real_internal_format): Likewise.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gcc-public.patch
Type: text/x-patch
Size: 11013 bytes
Desc: gcc-public.patch
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20160912/8a3ca2a6/attachment.bin>
More information about the Gcc-patches
mailing list