This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][tree-complex.c] PR tree-optimization/70291: Inline floating-point complex multiplication more aggressively
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>, "Joseph S. Myers" <joseph at codesourcery dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 2 May 2018 12:15:40 +0200
- Subject: Re: [PATCH][tree-complex.c] PR tree-optimization/70291: Inline floating-point complex multiplication more aggressively
- References: <5AE75555.6060303@foss.arm.com>
On Mon, Apr 30, 2018 at 7:41 PM, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
> Hi all,
>
> We can improve the performance of complex floating-point multiplications by
> inlining the expansion a bit more aggressively.
> We can inline complex x = a * b as:
> x = (ar*br - ai*bi) + i(ar*bi + br*ai);
> if (isunordered (__real__ x, __imag__ x))
> x = __muldc3 (a, b); //Or __mulsc3 for single-precision
>
> That way the common case where no NaNs are produced we can avoid the libgcc
> call and fall back to the
> NaN handling stuff in libgcc if either components of the expansion are NaN.
>
> The implementation is done in expand_complex_multiplication in
> tree-complex.c and the above expansion
> will be done when optimising for -O1 and greater and when not optimising for
> size.
> At -O0 and -Os the single call to libgcc will be emitted.
>
> For the code:
> __complex double
> foo (__complex double a, __complex double b)
> {
> return a * b;
> }
>
> We will now emit at -O2 for aarch64:
> foo:
> fmul d16, d1, d3
> fmul d6, d1, d2
> fnmsub d5, d0, d2, d16
> fmadd d4, d0, d3, d6
> fcmp d5, d4
> bvs .L8
> fmov d1, d4
> fmov d0, d5
> ret
> .L8:
> stp x29, x30, [sp, -16]!
> mov x29, sp
> bl __muldc3
> ldp x29, x30, [sp], 16
> ret
>
> Instead of just a branch to __muldc3.
>
> Bootstrapped and tested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
> x86_64-unknown-linux-gnu.
>
> Ok for trunk? (GCC 9)
+ /* If optimizing for size or not at all just do a libcall. */
+ if (optimize == 0 || optimize_function_for_size_p (cfun))
+ {
+ expand_complex_libcall (gsi, ar, ai, br, bi, MULT_EXPR);
+ return;
+ }
use optimize_bb_for_size_p instead please (get BB from the mult stmt).
/* Expand a complex multiplication or division to a libcall to the c99
+ compliant routines. Unlike expand_complex_libcall create and insert
+ the call, assign it to an output variable and return that rather than
+ modifying existing statements in place. */
+
+static tree
+insert_complex_mult_libcall (gimple_stmt_iterator *gsi, tree type, tree ar,
+ tree ai, tree br, tree bi)
+{
can you please try merging both functions instead?
Also it shows a possible issue if with -fnon-call-exceptions the original
multiplication has EH edges. I think you want to side-step that
by doing the libcall-only way in that case as well (stmt_can_throw_internal).
+ tree isunordered_decl = builtin_decl_explicit (BUILT_IN_ISUNORDERED);
+ tree isunordered_res = create_tmp_var (integer_type_node);
+ gimple *tmpr_unord_check
+ = gimple_build_call (isunordered_decl, 2, tmpr, tmpi);
+ gimple_call_set_lhs (tmpr_unord_check, isunordered_res);
+
+ gsi_insert_before (gsi, tmpr_unord_check, GSI_SAME_STMT);
+ gimple *check
+ = gimple_build_cond (NE_EXPR, isunordered_res, integer_zero_node,
+ NULL_TREE, NULL_TREE);
why use BUILT_IN_ISUNORDERED but not a GIMPLE_COND with
UNORDERED_EXPR? Note again that might trap/throw with -fsignalling-nans
so better avoid this transform for flag_signalling_nans as well...
+ /* We have a conditional block with some assignments in cond_bb.
+ Wire up the PHIs to wrap up. */
+ if (gimple_in_ssa_p (cfun))
+ {
we are always in SSA form(?) (probably tree-complex.c can use some TLC here).
+ /* If we are not worrying about NaNs expand to
+ (ar*br - ai*bi) + i(ar*bi + br*ai) directly. */
+ expand_complex_multiplication_limited_range (gsi, inner_type, ar, ai,
+ br, bi, &rr, &ri);
I think the function is badly worded - this isn't about limited
ranges, no? Which
also means that we can dispatch to this simple variant not only for
flag_complex_method != 2 but for !HONOR_NANS && !HONOR_INFINITIES?
Maybe that should be done as followup.
Richard.
> Thanks,
> Kyrill
>
> 2018-04-30 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> PR tree-optimization/70291
> * tree-complex.c (insert_complex_mult_libcall): New function.
> (expand_complex_multiplication_limited_range): Likewise.
> (expand_complex_multiplication): Expand floating-point complex
> multiplication using the above.
>
> 2018-04-30 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> PR tree-optimization/70291
> * gcc.dg/complex-6.c: New test.
> * gcc.dg/complex-7.c: Likewise.