This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Inline across -ffast-math boundary
- From: Richard Biener <rguenther at suse dot de>
- To: Jan Hubicka <hubicka at ucw dot cz>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Fri, 22 Apr 2016 10:00:49 +0200 (CEST)
- Subject: Re: Inline across -ffast-math boundary
- Authentication-results: sourceware.org; auth=none
- References: <20160421114548 dot GA22377 at kam dot mff dot cuni dot cz>
On Thu, 21 Apr 2016, Jan Hubicka wrote:
> Hi,
> this patch implements the long promised logic to inline across -ffast-math
> boundary when eitehr caller or callee has no fp operations in it. This is
> needed to resolve code quality regression on Firefox with LTO where
> -O3/-O2/-Ofast flags are combined and we fail to inline a lot of comdats
> otherwise.
>
> Bootstrapped/regtested x86_64-linux. Ricahrd, I would like to know your opinion
> on fp_expression_p predicate - it is bit ugly but I do not know how to implement
> it better.
>
> We still won't inline -O1 code into -O2+ because flag_strict_overflow differs.
> I will implement similar logic for overflows incrementally. Similarly flag_errno_math
> can be handled better, but I am not sure it matters - I think wast majority of time
> users set errno_math in sync with other -ffast-math implied flags.
Note that for reasons PR70586 shows (const functions having possible
trapping side-effect because of FP math or division) we'd like to
have sth like "uses FP math" "uses possibly trapping integer math"
"uses integer math with undefined overflow" on a per function level
and propagated alongside pure/const/nothrow state.
So maybe you can fit that into a more suitable place than just the
inliner (which of course is interested in "uses FP math locally",
not the transitive answer we need for PR70586).
More comments below.
> Honza
>
>
> * ipa-inline-analysis.c (reset_inline_summary): Clear fp_expressions
> (dump_inline_summary): Dump it.
> (fp_expression_p): New predicate.
> (estimate_function_body_sizes): Use it.
> (inline_merge_summary): Merge fp_expressions.
> (inline_read_section): Read fp_expressions.
> (inline_write_summary): Write fp_expressions.
> * ipa-inline.c (can_inline_edge_p): Permit inlining across fp math
> codegen boundary if either caller or callee is !fp_expressions.
> * ipa-inline.h (inline_summary): Add fp_expressions.
> * ipa-inline-transform.c (inline_call): When inlining !fp_expressions
> to fp_expressions be sure the fp generation flags are updated.
>
> * gcc.dg/ipa/inline-8.c: New testcase.
> Index: ipa-inline-analysis.c
> ===================================================================
> --- ipa-inline-analysis.c (revision 235312)
> +++ ipa-inline-analysis.c (working copy)
> @@ -1069,6 +1069,7 @@ reset_inline_summary (struct cgraph_node
> reset_inline_edge_summary (e);
> for (e = node->indirect_calls; e; e = e->next_callee)
> reset_inline_edge_summary (e);
> + info->fp_expressions = false;
> }
>
> /* Hook that is called by cgraph.c when a node is removed. */
> @@ -1423,6 +1424,8 @@ dump_inline_summary (FILE *f, struct cgr
> fprintf (f, " inlinable");
> if (s->contains_cilk_spawn)
> fprintf (f, " contains_cilk_spawn");
> + if (s->fp_expressions)
> + fprintf (f, " fp_expression");
> fprintf (f, "\n self time: %i\n", s->self_time);
> fprintf (f, " global time: %i\n", s->time);
> fprintf (f, " self size: %i\n", s->self_size);
> @@ -2459,6 +2462,42 @@ clobber_only_eh_bb_p (basic_block bb, bo
> return true;
> }
>
> +/* Return true if STMT compute a floating point expression that may be affected
> + by -ffast-math and similar flags. */
> +
> +static bool
> +fp_expression_p (gimple *stmt)
> +{
> + tree fndecl;
> +
> + if (gimple_code (stmt) == GIMPLE_ASSIGN
> + /* Even conversion to and from float are FP expressions. */
> + && (FLOAT_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
> + || FLOAT_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt))))
> + /* Plain moves are safe. */
> + && (IS_EXPR_CODE_CLASS (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)))
> + || TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison))
> + return true;
> +
> + /* Comparsions may be optimized with assumption that value is not NaN. */
> + if (gimple_code (stmt) == GIMPLE_COND
> + && (FLOAT_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt)))
> + || FLOAT_TYPE_P (TREE_TYPE (gimple_cond_rhs (stmt)))))
> + return true;
> +
> + /* Builtins may be optimized depending on math mode. We don't really have
> + list of these, so just check that there are no FP arguments. */
> + if (gimple_code (stmt) == GIMPLE_CALL
> + && (fndecl = gimple_call_fndecl (stmt)) != NULL_TREE
> + && DECL_BUILT_IN_CLASS (fndecl) != NOT_BUILT_IN)
> + {
> + for (unsigned int i=0; i < gimple_call_num_args (stmt); i++)
> + if (FLOAT_TYPE_P (TREE_TYPE (gimple_call_arg (stmt, i))))
> + return true;
> + }
I'd say a simpler implementation with the same effect would be
FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_DEF|SSA_OP_USE)
if (FLOAT_TYPE_P (TREE_TYPE (op)))
return true;
relying on the fact that no unfolded fully constant expressions
appear in the IL (the above doesn't visit constants).
Or if you want even those then
for (int i = 0; i < gimple_num_ops (stmt); ++i)
if (gimple_op (stmt, i) && FLOAT_TYPE_P (TREE_TYPE (gimple_op (stmt,
i))))
return true;
any cases we'd want to exclude should be explicitely handled before
this general handling so we're conservatively correct. So before
the above
if (is_gimple_call (stmt)
&& ! gimple_call_builtin_p (stmt))
return false;
I'm not sure about loads/stores as I believe some archs may
trap on them with SNANs or invalid reps (like IA64 NAT). This
is why we don't use FP modes for memcpy inline expansion in
middle-end code.
That said, I wonder if we want to have a uses_fp_locally
flag in struct function, conservatively initialized and
updated by inlining and re-computed by local-pure-const?
Richard.
> + return false;
> +}
> +
> /* Compute function body size parameters for NODE.
> When EARLY is true, we compute only simple summaries without
> non-trivial predicates to drive the early inliner. */
> @@ -2733,6 +2772,13 @@ estimate_function_body_sizes (struct cgr
> this_time * (2 - prob), &p);
> }
>
> + if (!info->fp_expressions && fp_expression_p (stmt))
> + {
> + info->fp_expressions = true;
> + if (dump_file)
> + fprintf (dump_file, " fp_expression set\n");
> + }
> +
> gcc_assert (time >= 0);
> gcc_assert (size >= 0);
> }
> @@ -3577,6 +3623,8 @@ inline_merge_summary (struct cgraph_edge
> else
> toplev_predicate = true_predicate ();
>
> + info->fp_expressions |= callee_info->fp_expressions;
> +
> if (callee_info->conds)
> evaluate_properties_for_edge (edge, true, &clause, NULL, NULL, NULL);
> if (ipa_node_params_sum && callee_info->conds)
> @@ -4229,6 +4277,7 @@ inline_read_section (struct lto_file_dec
> bp = streamer_read_bitpack (&ib);
> info->inlinable = bp_unpack_value (&bp, 1);
> info->contains_cilk_spawn = bp_unpack_value (&bp, 1);
> + info->fp_expressions = bp_unpack_value (&bp, 1);
>
> count2 = streamer_read_uhwi (&ib);
> gcc_assert (!info->conds);
> @@ -4395,6 +4444,7 @@ inline_write_summary (void)
> bp = bitpack_create (ob->main_stream);
> bp_pack_value (&bp, info->inlinable, 1);
> bp_pack_value (&bp, info->contains_cilk_spawn, 1);
> + bp_pack_value (&bp, info->fp_expressions, 1);
> streamer_write_bitpack (&bp);
> streamer_write_uhwi (ob, vec_safe_length (info->conds));
> for (i = 0; vec_safe_iterate (info->conds, i, &c); i++)
> Index: ipa-inline.c
> ===================================================================
> --- ipa-inline.c (revision 235319)
> +++ ipa-inline.c (working copy)
> @@ -396,6 +396,8 @@ can_inline_edge_p (struct cgraph_edge *e
> (DECL_DISREGARD_INLINE_LIMITS (callee->decl)
> && lookup_attribute ("always_inline",
> DECL_ATTRIBUTES (callee->decl)));
> + inline_summary *caller_info = inline_summaries->get (caller);
> + inline_summary *callee_info = inline_summaries->get (callee);
>
> /* Until GCC 4.9 we did not check the semantics alterning flags
> bellow and inline across optimization boundry.
> @@ -417,16 +419,21 @@ can_inline_edge_p (struct cgraph_edge *e
> == !opt_for_fn (callee->decl, optimize) || !always_inline))
> || check_match (flag_wrapv)
> || check_match (flag_trapv)
> - /* Strictly speaking only when the callee uses FP math. */
> - || check_maybe_up (flag_rounding_math)
> - || check_maybe_up (flag_trapping_math)
> - || check_maybe_down (flag_unsafe_math_optimizations)
> - || check_maybe_down (flag_finite_math_only)
> - || check_maybe_up (flag_signaling_nans)
> - || check_maybe_down (flag_cx_limited_range)
> - || check_maybe_up (flag_signed_zeros)
> - || check_maybe_down (flag_associative_math)
> - || check_maybe_down (flag_reciprocal_math)
> + /* When caller or callee does FP math, be sure FP codegen flags
> + compatible. */
> + || ((caller_info->fp_expressions && callee_info->fp_expressions)
> + && (check_maybe_up (flag_rounding_math)
> + || check_maybe_up (flag_trapping_math)
> + || check_maybe_down (flag_unsafe_math_optimizations)
> + || check_maybe_down (flag_finite_math_only)
> + || check_maybe_up (flag_signaling_nans)
> + || check_maybe_down (flag_cx_limited_range)
> + || check_maybe_up (flag_signed_zeros)
> + || check_maybe_down (flag_associative_math)
> + || check_maybe_down (flag_reciprocal_math)
> + /* Strictly speaking only when the callee contains function
> + calls that may end up setting errno. */
> + || check_maybe_up (flag_errno_math)))
> /* We do not want to make code compiled with exceptions to be
> brought into a non-EH function unless we know that the callee
> does not throw.
> @@ -435,9 +442,6 @@ can_inline_edge_p (struct cgraph_edge *e
> && DECL_FUNCTION_PERSONALITY (callee->decl))
> || (check_maybe_up (flag_exceptions)
> && DECL_FUNCTION_PERSONALITY (callee->decl))
> - /* Strictly speaking only when the callee contains function
> - calls that may end up setting errno. */
> - || check_maybe_up (flag_errno_math)
> /* When devirtualization is diabled for callee, it is not safe
> to inline it as we possibly mangled the type info.
> Allow early inlining of always inlines. */
> Index: ipa-inline.h
> ===================================================================
> --- ipa-inline.h (revision 235312)
> +++ ipa-inline.h (working copy)
> @@ -132,6 +132,8 @@ struct GTY(()) inline_summary
> /* True wen there is only one caller of the function before small function
> inlining. */
> unsigned int single_caller : 1;
> + /* True if function contains any floating point expressions. */
> + unsigned int fp_expressions : 1;
>
> /* Information about function that will result after applying all the
> inline decisions present in the callgraph. Generally kept up to
> Index: ipa-inline-transform.c
> ===================================================================
> --- ipa-inline-transform.c (revision 235312)
> +++ ipa-inline-transform.c (working copy)
> @@ -338,6 +338,63 @@ inline_call (struct cgraph_edge *e, bool
> DECL_FUNCTION_SPECIFIC_OPTIMIZATION (to->decl)
> = build_optimization_node (&opts);
> }
> + inline_summary *caller_info = inline_summaries->get (to);
> + inline_summary *callee_info = inline_summaries->get (callee);
> + if (!caller_info->fp_expressions && callee_info->fp_expressions)
> + {
> + caller_info->fp_expressions = true;
> + if (opt_for_fn (callee->decl, flag_rounding_math)
> + != opt_for_fn (to->decl, flag_rounding_math)
> + || opt_for_fn (callee->decl, flag_trapping_math)
> + != opt_for_fn (to->decl, flag_trapping_math)
> + || opt_for_fn (callee->decl, flag_unsafe_math_optimizations)
> + != opt_for_fn (to->decl, flag_unsafe_math_optimizations)
> + || opt_for_fn (callee->decl, flag_finite_math_only)
> + != opt_for_fn (to->decl, flag_finite_math_only)
> + || opt_for_fn (callee->decl, flag_signaling_nans)
> + != opt_for_fn (to->decl, flag_signaling_nans)
> + || opt_for_fn (callee->decl, flag_cx_limited_range)
> + != opt_for_fn (to->decl, flag_cx_limited_range)
> + || opt_for_fn (callee->decl, flag_signed_zeros)
> + != opt_for_fn (to->decl, flag_signed_zeros)
> + || opt_for_fn (callee->decl, flag_associative_math)
> + != opt_for_fn (to->decl, flag_associative_math)
> + || opt_for_fn (callee->decl, flag_reciprocal_math)
> + != opt_for_fn (to->decl, flag_reciprocal_math)
> + || opt_for_fn (callee->decl, flag_errno_math)
> + != opt_for_fn (to->decl, flag_errno_math))
> + {
> + struct gcc_options opts = global_options;
> +
> + cl_optimization_restore (&opts, opts_for_fn (to->decl));
> + opts.x_flag_rounding_math
> + = opt_for_fn (callee->decl, flag_rounding_math);
> + opts.x_flag_trapping_math
> + = opt_for_fn (callee->decl, flag_trapping_math);
> + opts.x_flag_unsafe_math_optimizations
> + = opt_for_fn (callee->decl, flag_unsafe_math_optimizations);
> + opts.x_flag_finite_math_only
> + = opt_for_fn (callee->decl, flag_finite_math_only);
> + opts.x_flag_signaling_nans
> + = opt_for_fn (callee->decl, flag_signaling_nans);
> + opts.x_flag_cx_limited_range
> + = opt_for_fn (callee->decl, flag_cx_limited_range);
> + opts.x_flag_signed_zeros
> + = opt_for_fn (callee->decl, flag_signed_zeros);
> + opts.x_flag_associative_math
> + = opt_for_fn (callee->decl, flag_associative_math);
> + opts.x_flag_reciprocal_math
> + = opt_for_fn (callee->decl, flag_reciprocal_math);
> + opts.x_flag_errno_math
> + = opt_for_fn (callee->decl, flag_errno_math);
> + if (dump_file)
> + fprintf (dump_file, "Copying FP flags from %s:%i to %s:%i\n",
> + callee->name (), callee->order, to->name (), to->order);
> + build_optimization_node (&opts);
> + DECL_FUNCTION_SPECIFIC_OPTIMIZATION (to->decl)
> + = build_optimization_node (&opts);
> + }
> + }
>
> /* If aliases are involved, redirect edge to the actual destination and
> possibly remove the aliases. */
> Index: testsuite/gcc.dg/ipa/inline-8.c
> ===================================================================
> --- testsuite/gcc.dg/ipa/inline-8.c (revision 0)
> +++ testsuite/gcc.dg/ipa/inline-8.c (working copy)
> @@ -0,0 +1,36 @@
> +/* Verify that we do not inline isnanf test info -ffast-math code but that we
> + do inline trivial functions across -Ofast boundary. */
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +#include <math.h>
> +/* Can't be inlined because isnanf will be optimized out. */
> +int
> +cmp (float a)
> +{
> + return isnanf (a);
> +}
> +/* Can be inlined. */
> +float
> +move (float a)
> +{
> + return a;
> +}
> +float a;
> +void
> +set ()
> +{
> + a=nan("");
> +}
> +float b;
> +__attribute__ ((optimize("Ofast")))
> +int
> +main()
> +{
> + b++;
> + if (cmp(a))
> + __builtin_abort ();
> + float a = move (1);
> + if (!__builtin_constant_p (a))
> + __builtin_abort ();
> + return 0;
> +}
>
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)