Consider code: int f2(int a, int b, int c, int d) { a = -a; b += a * c; c += a * d; return b ^ c; } On aarch64 this could be written with to msub instructions, RTL code: (set (reg/i:SI 0 x0) (minus:SI (reg:SI 1 x1 [ b ]) (mult:SI (reg:SI 0 x0 [ a ]) (reg:SI 2 x2 [ c ])))) However, combine doesn't merge the neg and the multiply-adds and generates: f2: neg w4, w0 madd w0, w4, w2, w1 madd w3, w4, w3, w2 eor w0, w0, w3 ret If we modify the code to only do a single multiply-accumulate: int f2(int a, int b, int c, int d) { a = -a; b += a * c; return b; } Then they the expected single msub instruction is generated. I think this is due to combine being scared of the negated 'a' being used in two places.
This was the reason why the pass on the tree level to fused multiply add was added. Maybe we can do the same. Do you know how often this shows up?
(In reply to Andrew Pinski from comment #1) > This was the reason why the pass on the tree level to fused multiply add was > added. Maybe we can do the same. > > Do you know how often this shows up? My understanding is that it occurs in SPEC2006 and leslie3d in particular, but with FP code there (just substitute int for double in the testcase and pick something other than eor). Don't have any numbers on how often or how much of a bottleneck it is to performance...
For floating point, tree-ssa-math-opts.c handles the FMA and then expand is supposed to handle the neg inside the FMA_EXPR but for some reason on this testcase it is not. def0 = get_def_for_expr (treeop0, NEGATE_EXPR); /* The multiplication is commutative - look at its 2nd operand if the first isn't fed by a negate. */ if (!def0) { def0 = get_def_for_expr (treeop1, NEGATE_EXPR); /* Swap operands if the 2nd operand is fed by a negate. */ if (def0) { tree tem = treeop0; treeop0 = treeop1; treeop1 = tem; } } def2 = get_def_for_expr (treeop2, NEGATE_EXPR); op0 = op2 = NULL; if (def0 && def2 && optab_handler (fnms_optab, mode) != CODE_FOR_nothing) { opt = fnms_optab; op0 = expand_normal (gimple_assign_rhs1 (def0)); op2 = expand_normal (gimple_assign_rhs1 (def2)); } else if (def0 && optab_handler (fnma_optab, mode) != CODE_FOR_nothing) { opt = fnma_optab; op0 = expand_normal (gimple_assign_rhs1 (def0)); } else if (def2 && optab_handler (fms_optab, mode) != CODE_FOR_nothing) { opt = fms_optab; op2 = expand_normal (gimple_assign_rhs1 (def2)); } So you are going to need to debug why expr.c is not handling this case for floating point. Again for integer, it is a different story. And I doubt for integer it shows up that often and if it did, the neg could most likely schedule with some other instruction.
Combine does not do this because it would need to combine 2 insns into 2. This *can* be safely done (carefully). As a result combine can do more work (which is good), but lifetime of registers can get longer too (but also shorter in some cases). I don't yet know if it is a win on average. diff --git a/gcc/combine.c b/gcc/combine.c index 9861291..599d383 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -3992,8 +3992,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, && XVECLEN (newpat, 0) == 2 && GET_CODE (XVECEXP (newpat, 0, 0)) == SET && GET_CODE (XVECEXP (newpat, 0, 1)) == SET - && (i1 || set_noop_p (XVECEXP (newpat, 0, 0)) - || set_noop_p (XVECEXP (newpat, 0, 1))) +// && (i1 || set_noop_p (XVECEXP (newpat, 0, 0)) +// || set_noop_p (XVECEXP (newpat, 0, 1))) && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT
Confirmed.
Fixed in GCC 9 with 2->2 combing being done now (r9-2064-gc4c5ad1d6d1e1e1fe7a): Trying 11 -> 12: 11: r96:SI=-r115:SI REG_DEAD r115:SI 12: r108:SI=r96:SI*r103:SI Failed to match this instruction: (parallel [ (set (reg:SI 108) (mult:SI (neg:SI (reg:SI 115)) (reg/v:SI 103 [ cD.4384 ]))) (set (reg/v:SI 96 [ aD.4382 ]) (neg:SI (reg:SI 115))) ]) Failed to match this instruction: (parallel [ (set (reg:SI 108) (mult:SI (neg:SI (reg:SI 115)) (reg/v:SI 103 [ cD.4384 ]))) (set (reg/v:SI 96 [ aD.4382 ]) (neg:SI (reg:SI 115))) ]) Successfully matched this instruction: (set (reg/v:SI 96 [ aD.4382 ]) (neg:SI (reg:SI 115))) Successfully matched this instruction: (set (reg:SI 108) (mult:SI (neg:SI (reg:SI 115)) (reg/v:SI 103 [ cD.4384 ]))) allowing combination of insns 11 and 12 original costs 4 + 12 = 16 replacement costs 4 + 12 = 16 modifying insn i2 11: r96:SI=-r115:SI deferring rescan insn with uid = 11. modifying insn i3 12: r108:SI=-r115:SI*r103:SI REG_DEAD r115:SI deferring rescan insn with uid = 12.