[PATCH] Optimise the fpclassify builtin to perform integer operations when possible

Tamar Christina Tamar.Christina@arm.com
Mon Sep 12 16:21:00 GMT 2016


Hi All,

This patch adds an optimized route to the fpclassify builtin
for floating point numbers which are similar to IEEE-754 in format.

The goal is to make it faster by:
1. Trying to determine the most common case first
   (e.g. the float is a Normal number) and then the
   rest. The amount of code generated at -O2 are
   about the same +/- 1 instruction, but the code
   is much better.
2. Using integer operation in the optimized path.

At a high level, the optimized path uses integer operations
to perform the following:

  if (exponent bits aren't all set or unset)
     return Normal;
  else if (no bits are set on the number after masking out
	   sign bits then)
     return Zero;
  else if (exponent has no bits set)
     return Subnormal;
  else if (mantissa has no bits set)
     return Infinite;
  else
     return NaN;

In case the optimization can't be applied the old
implementation is used as a fall-back.

A limitation with this new approach is that the exponent
of the floating point has to fit in 31 bits and the floating
point has to have an IEEE like format and values for NaN and INF
(e.g. for NaN and INF all bits of the exp must be set).

To determine this IEEE likeness a new boolean was added to real_format.

Regression tests ran on aarch64-none-linux and arm-none-linux-gnueabi
and no regression. x86 uses it's own implementation other than 
the fpclassify builtin.

As an example, Aarch64 now generates for classification of doubles:

f:
	fmov	x1, d0
	mov	w0, 7
	sbfx	x2, x1, 52, 11
	add	w3, w2, 1
	tst	w3, 0x07FE
	bne	.L1
	mov	w0, 13
	tst	x1, 0x7fffffffffffffff
	beq	.L1
	mov	w0, 11
	tbz	x2, 0, .L1
	tst	x1, 0xfffffffffffff
	mov	w0, 3
	mov	w1, 5
	csel	w0, w0, w1, ne

.L1:
	ret

No new tests as there are existing tests to test functionality.
glibc benchmarks ran against the builtin and this shows a 31.3%
performance gain.

Ok for trunk?

Thanks,
Tamar

PS. I don't have commit rights so if OK can someone apply the patch for me.

gcc/
2016-08-25  Tamar Christina  <tamar.christina@arm.com>
	    Wilco Dijkstra  <wilco.dijkstra@arm.com>

	* gcc/builtins.c (fold_builtin_fpclassify): Added optimized version. 
	* gcc/real.h (real_format): Added is_ieee_compatible field.
	* gcc/real.c (ieee_single_format): Set is_ieee_compatible flag.
	(mips_single_format): Likewise.
	(motorola_single_format): Likewise.
	(spu_single_format): Likewise.
	(ieee_double_format): Likewise.
	(mips_double_format): Likewise.
	(motorola_double_format): Likewise.
	(ieee_extended_motorola_format): Likewise.
	(ieee_extended_intel_128_format): Likewise.
	(ieee_extended_intel_96_round_53_format): Likewise.
	(ibm_extended_format): Likewise.
	(mips_extended_format): Likewise.
	(ieee_quad_format): Likewise.
	(mips_quad_format): Likewise.
	(vax_f_format): Likewise.
	(vax_d_format): Likewise.
	(vax_g_format): Likewise.
	(decimal_single_format): Likewise.
	(decimal_quad_format): Likewise.
	(iee_half_format): Likewise.
	(mips_single_format): Likewise.
	(arm_half_format): Likewise.
	(real_internal_format): Likewise.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gcc-public.patch
Type: text/x-patch
Size: 11013 bytes
Desc: gcc-public.patch
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20160912/8a3ca2a6/attachment.bin>


More information about the Gcc-patches mailing list