[PATCHv2][GCC] Optimise the fpclassify builtin to perform integer operations when possible

Richard Biener rguenther@suse.de
Fri Oct 21 08:05:00 GMT 2016


On Thu, 20 Oct 2016, Jeff Law wrote:

> On 09/30/2016 07:22 AM, Tamar Christina wrote:
> > Hi All,
> > 
> > This is v2 of the patch which adds an optimized route to the fpclassify
> > builtin
> > for floating point numbers which are similar to IEEE-754 in format.
> > 
> > I have addressed most comments from everyone except for two things:
> > 
> > 1) Providing a back-end hook to override the functionality. While certainly
> >    possible the current fpclassify doesn't provide this either. So I'd like
> > to
> >    treat it as an enhancement rather than an issue.
> I think the concern here is PPC, particularly the newer ones which have
> significant hardware support for these kind of characterizations.
> 
> Based on the discussions though, I suspect we're going to need something
> nontrivial due to the way the API for __builtin_fpclassify works.  In the end
> I can easily see some target way to override the default code synthesis.
> 
> I think these issues should be left for the PPC folks to propose a solution
> when they're ready to exploit their new hardware.  I don't think this should
> block the patch.
> 
> 
> 
> > 
> > 2) Doing it in a lowering phase. If the general consensus is that this is
> > the
> >    path the patch must take then I'd be happy to reconsider. However at this
> >    this patch does not seem to produce worse code than what there was
> > before.
> I think that was a desire from Richi.   I'm a bit torn here.
> 
> The code looks more like lowering rather than folding.  But it's also
> generating non-gimple trees and relies on gimple_fold_builtin to re-gimplify
> the result AFAICT.
>
> Richi -- thoughts?

I'm not entirely happy with the patch but also not with the current
state of handling of fpclassify.  I do see the need to lower(!)
fpclassify early because we want to optimize it both depending on
the return value usage and the input value.

The lowering we currently apply open-codes isnormal (we have a
builtin for this) and isfinite (likewise).  I'd prefer if we can
apply the lowering in the gimplifier and somehow avoid the
early decision on whether to use FP or integer code to perform
the operation.  Sth like

 fpclassify(x) -> isnan(x) ? FP_NAN : isnormal(x) ? FP_NORMAL
: !isfinite(x) ? FP_INFINITE : x == 0 ? FP_ZERO : FP_SUBNORMAL

(leaves the comparison against zero in explicit FP math).  We do
have later foldings that expand isnormal and isfinite and
isinf to use compares -- those are the ones that we might want to
change to integer reps.

We are also missing optabs for most of the sub-classification
tasks which would make it possible to re-combine the whole
thing back to a single fpclassify asm op.

That said, the folding to integer ops obfuscates the real operation
and thus makes the job of a (not yet existing) pass optimizing
these kind of classifications via range analysis or the like hard.
Thus I'd rather apply those at or near to RTL expansion time.

Richard.
 
> --
> 
> I think its nontrivial to judge worse vs better since it's really a function
> of the target's micro-architecture and the context in which fpclassify is
> called -- particularly where the input value lives and whether or not its used
> in other ways nearby.
> 
> In the case where the input value is in memory or not used in floating point
> arithmetic nearby, your change should be a clear win (with the exception of
> the latest ppc hardware perhaps).
> 
> If the input value is not in memory and used nearby in FP ops, then it gets a
> lot trickier.  We run the risk of making the object addressable which means it
> won't be an SSA_NAME and thus not exposed to the high level optimizers.
> 
> Richi has indicated that in gimple an object need not be addressable just
> because we access random pieces of it, including the ability to avoid marking
> something as addressable even though we have MEM (&decl) style expressions.
> I'm not sure how all that works, but trust Richi implicitly.  Additionally
> you're using VIEW_CONVERT_EXPR now rather than ADDR_EXPR, so that may mitigate
> things as well.
> 
> Finally there's the issue of having to transfer the object between the FP and
> GP register files which can be highly expensive on some architectures.  Short
> of looking at the defs & immediate uses of the input argument and trying to
> guess at the cost of moving the object between the register files I don't see
> a good way to tackle this issue.
> 
> I'm inclined to not object on the performance questions.  But I would like to
> hear from Richi WRT lowering vs folding and whether or not he believes this
> belongs elsewhere.  If he does, I'd be inclined to suggest earlier rather than
> later since we do want to expose the generated code to the gimple optimizers.
> 
> Additional implementation comments follow inline.
> 
> 
> > 
> > gcc/
> > 2016-08-25  Tamar Christina  <tamar.christina@arm.com>
> > 	    Wilco Dijkstra  <wilco.dijkstra@arm.com>
> > 
> > 	* gcc/builtins.c (fold_builtin_fpclassify): Added optimized version.
> > 	* gcc/real.h (real_format): Added is_ieee_compatible field.
> > 	* gcc/real.c (ieee_single_format): Set is_ieee_compatible flag.
> > 	(mips_single_format): Likewise.
> > 	(motorola_single_format): Likewise.
> > 	(spu_single_format): Likewise.
> > 	(ieee_double_format): Likewise.
> > 	(mips_double_format): Likewise.
> > 	(motorola_double_format): Likewise.
> > 	(ieee_extended_motorola_format): Likewise.
> > 	(ieee_extended_intel_128_format): Likewise.
> > 	(ieee_extended_intel_96_round_53_format): Likewise.
> > 	(ibm_extended_format): Likewise.
> > 	(mips_extended_format): Likewise.
> > 	(ieee_quad_format): Likewise.
> > 	(mips_quad_format): Likewise.
> > 	(vax_f_format): Likewise.
> > 	(vax_d_format): Likewise.
> > 	(vax_g_format): Likewise.
> > 	(decimal_single_format): Likewise.
> > 	(decimal_quad_format): Likewise.
> > 	(iee_half_format): Likewise.
> > 	(mips_single_format): Likewise.
> > 	(arm_half_format): Likewise.
> > 	(real_internal_format): Likewise.
> > 
> > 
> > gcc/testsuite/
> > 2016-09-27  Tamar Christina  <tamar.christina@arm.com>
> > 
> > 	* gcc.target/aarch64/builtin-fpclassify.c: New codegen test.
> > 
> > 
> > gcc-v2-fpclassify.patch
> > 
> > 
> > diff --git a/gcc/builtins.c b/gcc/builtins.c
> > index
> > 9a19a75cc8ed6edb5f543cd7bd26bcc0693e6ebb..1b4878c5ba098dcc0a4a506dbc7959d150cc9028
> > 100644
> > --- a/gcc/builtins.c
> > +++ b/gcc/builtins.c
> > @@ -7943,10 +7943,8 @@ static tree
> >  fold_builtin_fpclassify (location_t loc, tree *args, int nargs)
> >  {
> >    tree fp_nan, fp_infinite, fp_normal, fp_subnormal, fp_zero,
> > -    arg, type, res, tmp;
> > +    arg, type, res;
> >    machine_mode mode;
> > -  REAL_VALUE_TYPE r;
> > -  char buf[128];
> > 
> >    /* Verify the required arguments in the original call.  */
> >    if (nargs != 6
> > @@ -7966,14 +7964,164 @@ fold_builtin_fpclassify (location_t loc, tree
> > *args, int nargs)
> >    arg = args[5];
> >    type = TREE_TYPE (arg);
> >    mode = TYPE_MODE (type);
> > -  arg = builtin_save_expr (fold_build1_loc (loc, ABS_EXPR, type, arg));
> > +  const real_format *format = REAL_MODE_FORMAT (mode);
> > +  const HOST_WIDE_INT type_width = TYPE_PRECISION (type);
> > +
> > +  /*
> > +  For IEEE 754 types:
> > +
> > +  fpclassify (x) ->
> > +       !((exp + 1) & (exp_mask & ~1)) // exponent bits not all set or unset
> > +	 ? (x & sign_mask == 0 ? FP_ZERO :
> > +	   (exp & exp_mask == exp_mask
> > +	      ? (mantisa == 0 ? FP_INFINITE : FP_NAN) :
> > +	      FP_SUBNORMAL)):
> > +       FP_NORMAL.
> > +
> > +  Otherwise
> > +
> > +  fpclassify (x) ->
> > +       isnan (x) ? FP_NAN :
> > +	(fabs (x) == Inf ? FP_INFINITE :
> > +	   (fabs (x) >= DBL_MIN ? FP_NORMAL :
> > +	     (x == 0 ? FP_ZERO : FP_SUBNORMAL))).
> > +  */
> > +
> > +  /* Check if the number that is being classified is close enough to IEEE
> > 754
> > +     format to be able to go in the early exit code.  */
> > +  if (format->is_binary_ieee_compatible
> > +      && FLOAT_WORDS_BIG_ENDIAN == WORDS_BIG_ENDIAN
> > +      && !(UNITS_PER_WORD == 4 && type_width == 128)
> Why the UNITS_PER_WORD && type width tests against constants?
> 
> > +
> > +      /* Extract exp bits from the float, where we expect the exponent to
> > be.
> > +	 We create a new type because BIT_FIELD_REF does not allow you to
> > +	 extract less bits than the precision of the storage variable.  */
> > +      exp_bitfield
> > +        = fold_build3_loc (loc, BIT_FIELD_REF,
> > +			   build_nonstandard_integer_type (exp_bits, 0),
> > +			   int_arg,
> > +			   build_int_cst (int_type, exp_bits),
> > +			   build_int_cst (int_type, format->p - 1));
> > +
> > +      /* Re-interpret the extracted exponent bits as a 32 bit int.
> > +	 This allows us to continue doing operations as int_type.  */
> > +      exp = fold_build1_loc (loc, NOP_EXPR, int_type, exp_bitfield);
> > +
> > +      /* Set up some often used constants.  */
> > +      const_arg0 = build_int_cst (int_arg_type, 0);
> > +      const_arg1 = build_int_cst (int_arg_type, 1);
> > +      const_arg2 = build_int_cst (int_arg_type, 2);
> > +      const0 = build_int_cst (int_type, 0);
> > +      const1 = build_int_cst (int_type, 1);
> Note we have integer_zero_node and integer_one_node.  You may be able to use
> them rather than generating new nodes.  If you need the node in a different
> type, you can use fold_convert_const to do that.
> 
> 
> 
> > +
> > +      /* b^(p-1) - 1 or 1 << (p - 2)
> > +	 This creates a mask to be used to check the mantissa value.  */
> > +      significant_bit = build_int_cst (int_arg_type, format->p - 2);
> > +      mantissa_mask
> > +        = fold_build2_loc (loc, MINUS_EXPR, int_arg_type,
> > +			   fold_build2_loc (loc, LSHIFT_EXPR, int_arg_type,
> > +					    const_arg2, significant_bit),
> > +			   const_arg1);
> Can you please double-check this?  In particular you seem to be generating 2
> << (p - 2), which doesn't seem to match the comment.
> 
> 
> 
> 
> > 
> > diff --git a/gcc/testsuite/gcc.target/aarch64/builtin-fpclassify.c
> > b/gcc/testsuite/gcc.target/aarch64/builtin-fpclassify.c
> > new file mode 100644
> > index
> > 0000000000000000000000000000000000000000..84a73a6483780dac2347e72fa7d139545d2087eb
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/builtin-fpclassify.c
> > @@ -0,0 +1,22 @@
> > +/* This file checks the code generation for the new __builtin_fpclassify.
> > +   because checking the exact assembly isn't very useful, we'll just be
> > checking
> > +   for the presence of certain instructions and the omition of others. */
> s/omition/omission/
> 
> 
> ISTM you'd want to test this for float, double and long double.
> 
> 
> jeff
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)



More information about the Gcc-patches mailing list