[PATCH] Attempt harder to emit a cmove in emit_conditional_move (PR tree-optimization/79390)
Richard Biener
rguenther@suse.de
Wed Apr 12 17:57:00 GMT 2017
On April 12, 2017 6:33:19 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>Hi!
>
>As mentioned in the PR, for LU benchmark we generate worse code with
>-Ofast
>compared to -O3, because in the former we don't use a conditional move.
>
>The problem is during emit_conditional_move, while in both cases
>swap_commutative_operands_p (op2, op3) tells us it might be better to
>swap
>them, the difference between -Ofast and -O3 is that
>reversed_comparison_code_parts
>returns in -Ofast case LE, but in -O3 UNKNOWN. For -O3 this means we
>don't
>swap o2 and o3, emit a GT and successfully emit a conditional move.
>For -Ofast, we swap the arguments and attempt to emit a LE, but the
>predicate on the cbranchdf4 expander fails in that case, because "LE"
>is not
>considered a trivial comparison operator, is more expensive, thus the
>only
>attempt we try fails and we don't emit a cmov at all.
>
>The following patch will do the same thing as previously, but if we
>were to
>return NULL_RTX, it will try to swap op2/op3 (either back, if we
>swapped
>them first, effectively returning to the original comparison, or if we
>haven't swapped them first, trying to reverse the comparison) and see
>if
>that leads to a usable sequence.
>
>The patch caused a FAIL in the pr70465-2.c testcase, where previously
>in ce1
>we weren't able to emit any cmov and now are, which tiny bit changes
>the
>IL that gets through regstack and thus the testcase is not testing
>anymore
>what it wanted - the %st(1) and %st comparison arguments are swapped
>and
>the fcmov operands too and thus a fxchg is seen as needed (before
>regstack
>it is impossible to find out which ordering is more beneficial
>actually).
>We could teach regstack to attempt to reverse the comparisons and swap
>corresponding fcmov arguments, but that does look like a GCC8 material
>to
>me.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK.
Richard.
>2017-04-12 Jakub Jelinek <jakub@redhat.com>
>
> PR tree-optimization/79390
> * optabs.c (emit_conditional_move): If the preferred op2/op3 operand
> order does not result in usable sequence, retry with reversed operand
> order.
>
> * gcc.target/i386/pr70465-2.c: Xfail the scan-assembler-not test.
>
>--- gcc/optabs.c.jj 2017-02-02 08:44:12.000000000 +0100
>+++ gcc/optabs.c 2017-04-12 14:30:14.905433771 +0200
>@@ -4258,12 +4258,15 @@ emit_conditional_move (rtx target, enum
> if (cmode == VOIDmode)
> cmode = GET_MODE (op0);
>
>+ enum rtx_code orig_code = code;
>+ bool swapped = false;
> if (swap_commutative_operands_p (op2, op3)
> && ((reversed = reversed_comparison_code_parts (code, op0, op1, NULL))
> != UNKNOWN))
> {
> std::swap (op2, op3);
> code = reversed;
>+ swapped = true;
> }
>
> if (mode == VOIDmode)
>@@ -4272,45 +4275,62 @@ emit_conditional_move (rtx target, enum
> icode = direct_optab_handler (movcc_optab, mode);
>
> if (icode == CODE_FOR_nothing)
>- return 0;
>+ return NULL_RTX;
>
> if (!target)
> target = gen_reg_rtx (mode);
>
>- code = unsignedp ? unsigned_condition (code) : code;
>- comparison = simplify_gen_relational (code, VOIDmode, cmode, op0,
>op1);
>-
>- /* We can get const0_rtx or const_true_rtx in some circumstances.
>Just
>- return NULL and let the caller figure out how best to deal with
>this
>- situation. */
>- if (!COMPARISON_P (comparison))
>- return NULL_RTX;
>-
>- saved_pending_stack_adjust save;
>- save_pending_stack_adjust (&save);
>- last = get_last_insn ();
>- do_pending_stack_adjust ();
>- prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
>- GET_CODE (comparison), NULL_RTX, unsignedp, OPTAB_WIDEN,
>- &comparison, &cmode);
>- if (comparison)
>+ for (int pass = 0; ; pass++)
> {
>- struct expand_operand ops[4];
>+ code = unsignedp ? unsigned_condition (code) : code;
>+ comparison = simplify_gen_relational (code, VOIDmode, cmode,
>op0, op1);
>
>- create_output_operand (&ops[0], target, mode);
>- create_fixed_operand (&ops[1], comparison);
>- create_input_operand (&ops[2], op2, mode);
>- create_input_operand (&ops[3], op3, mode);
>- if (maybe_expand_insn (icode, 4, ops))
>+ /* We can get const0_rtx or const_true_rtx in some
>circumstances. Just
>+ punt and let the caller figure out how best to deal with this
>+ situation. */
>+ if (COMPARISON_P (comparison))
> {
>- if (ops[0].value != target)
>- convert_move (target, ops[0].value, false);
>- return target;
>+ saved_pending_stack_adjust save;
>+ save_pending_stack_adjust (&save);
>+ last = get_last_insn ();
>+ do_pending_stack_adjust ();
>+ prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
>+ GET_CODE (comparison), NULL_RTX, unsignedp,
>+ OPTAB_WIDEN, &comparison, &cmode);
>+ if (comparison)
>+ {
>+ struct expand_operand ops[4];
>+
>+ create_output_operand (&ops[0], target, mode);
>+ create_fixed_operand (&ops[1], comparison);
>+ create_input_operand (&ops[2], op2, mode);
>+ create_input_operand (&ops[3], op3, mode);
>+ if (maybe_expand_insn (icode, 4, ops))
>+ {
>+ if (ops[0].value != target)
>+ convert_move (target, ops[0].value, false);
>+ return target;
>+ }
>+ }
>+ delete_insns_since (last);
>+ restore_pending_stack_adjust (&save);
> }
>+
>+ if (pass == 1)
>+ return NULL_RTX;
>+
>+ /* If the preferred op2/op3 order is not usable, retry with
>other
>+ operand order, perhaps it will expand successfully. */
>+ if (swapped)
>+ code = orig_code;
>+ else if ((reversed = reversed_comparison_code_parts (orig_code,
>op0, op1,
>+ NULL))
>+ != UNKNOWN)
>+ code = reversed;
>+ else
>+ return NULL_RTX;
>+ std::swap (op2, op3);
> }
>- delete_insns_since (last);
>- restore_pending_stack_adjust (&save);
>- return NULL_RTX;
> }
>
>
>--- gcc/testsuite/gcc.target/i386/pr70465-2.c.jj 2017-02-22
>18:15:13.000000000 +0100
>+++ gcc/testsuite/gcc.target/i386/pr70465-2.c 2017-04-12
>18:07:00.784761399 +0200
>@@ -1,7 +1,7 @@
> /* PR target/70465 */
> /* { dg-do compile } */
> /* { dg-options "-Ofast -mfpmath=387 -fomit-frame-pointer" } */
>-/* { dg-final { scan-assembler-not "fxch\t%st.1" } } */
>+/* { dg-final { scan-assembler-not "fxch\t%st.1" { xfail *-*-* } } }
>*/
>
> extern float d[1024];
>
>
> Jakub
More information about the Gcc-patches
mailing list