Compile the attached source code with options -Os -march=armv7-a -mthumb, gcc generates: foo4: push {r3, r4, r5, lr} bl bar cmp r0, #0 // A mov r5, r0 // B blt .L3 ... Instructions AB can be merged into one instruction "sub r5, r0, 0" The rtx for instructions AB are: (insn:TI 9 7 8 src/ta.c:8 (set (reg:CC 24 cc) (compare:CC (reg:SI 0 r0 [orig:134 f ] [134]) (const_int 0 [0x0]))) 220 {*arm_cmpsi_insn} (expr_list:REG_DEAD (reg:SI 0 r0 [orig:134 f ] [134]) (nil))) (insn 8 9 10 src/ta.c:7 (set (reg/v:SI 5 r5 [orig:134 f ] [134]) (reg:SI 0 r0)) 658 {*thumb2_movsi_insn} (nil)) We have another insn pattern "*movsi_compare0" that can be used to represent the combined operations. Should insns AB be combined automatically?
Created attachment 20262 [details] test case
In .final, the insns look like this: @(insn:TI 9 7 8 t.c:8 (set (reg:CC 24 cc) @ (compare:CC (reg:SI 0 r0 [orig:134 f ] [134]) @ (const_int 0 [0x0]))) 220 {*arm_cmpsi_insn} (expr_list:REG_DEAD (reg:SI 0 r0 [orig:134 f ] [134]) @ (nil))) cmp r0, #0 @ 9 *arm_cmpsi_insn/1 [length = 4] @(insn 8 9 10 t.c:7 (set (reg/v:SI 5 r5 [orig:134 f ] [134]) @ (reg:SI 0 r0)) 658 {*thumb2_movsi_insn} (nil)) mov r5, r0 @ 8 *thumb2_movsi_insn/1 [length = 4] This, AFAICT, would match with the pattern and constraints of movsi_compare0: ;; If copying one reg to another we can set the condition codes according to ;; its value. Such a move is common after a return from subroutine and the ;; result is being tested against zero. (define_insn "*movsi_compare0" [(set (reg:CC CC_REGNUM) (compare:CC (match_operand:SI 1 "s_register_operand" "0,r") (const_int 0))) (set (match_operand:SI 0 "s_register_operand" "=r,r") (match_dup 1))] "TARGET_32BIT" "@ cmp%?\\t%0, #0 sub%.\\t%0, %1, #0" [(set_attr "conds" "set")] ) But this insns can probably only be formed in the .combine pass, since (to the best of my knowledge) we don't construct multiple-set insns in any other pass. In the .combine pass, the two insns look like this (obviously reversed, as you could already suspect from the insns numbers 9 and 8 above): (insn 8 7 9 2 t.c:7 (set (reg/v:SI 134 [ f ]) (reg:SI 0 r0)) 658 {*thumb2_movsi_insn} (expr_list:REG_DEAD (reg:SI 0 r0) (nil))) (insn 9 8 10 2 t.c:8 (set (reg:CC 24 cc) (compare:CC (reg/v:SI 134 [ f ]) (const_int 0 [0x0]))) 220 {*arm_cmpsi_insn} (nil)) But combine does not even try to combine these two insns: ------------ ... insn_cost 24: 0 Trying 9 -> 10: Failed to match this instruction: (set (pc) (if_then_else (lt (reg/v:SI 134 [ f ]) (const_int 0 [0x0])) (label_ref:SI 29) (pc))) foo4 ... ------------
Well, according to GDB, combine does try to combine in try_combine: Breakpoint 8, try_combine (i3=0x20000000004fc8b8, i2=0x20000000004fc828, i1=0x0, new_direct_jump_p=0x60000fffffd4ae28) at ../../trunk/gcc/combine.c:2389 2389 rtx newpat, newi2pat = 0; 3: debug_rtx(i3) = (insn 9 8 10 2 t.c:8 (set (reg:CC 24 cc) (compare:CC (reg/v:SI 134 [ fD.1273 ]) (const_int 0 [0x0]))) 220 {*arm_cmpsi_insn} (nil)) void 2: debug_rtx(i2) = (insn 8 7 9 2 t.c:7 (set (reg/v:SI 134 [ fD.1273 ]) (reg:SI 0 r0)) 658 {*thumb2_movsi_insn} (expr_list:REG_DEAD (reg:SI 0 r0) (nil))) But then we hit cant_combine_insn_p(i2) because r0 is LO_REGS, we're compiling for TARGET_THUMB, and so CLASS_LIKELY_SPILLED_P (REGNO_REG_CLASS (REGNO (src))) is true for src=r0 in this reg-reg move: if (REG_P (src) && REG_P (dest) && ((REGNO (src) < FIRST_PSEUDO_REGISTER && ! fixed_regs[REGNO (src)] && CLASS_LIKELY_SPILLED_P (REGNO_REG_CLASS (REGNO (src)))) || (REGNO (dest) < FIRST_PSEUDO_REGISTER && ! fixed_regs[REGNO (dest)] && CLASS_LIKELY_SPILLED_P (REGNO_REG_CLASS (REGNO (dest)))))) return 1;
It's not very hard to add a define_peephole2 pattern for this case also, although it's a bit of a hack. I'm not even sure if it would handle the case cmp+mov and mov+cmp case -- does peephole2 care about the order of the insns? If so, then you'd really need even 2 new define_peephole2's. You'd almost think we need a post-reload combine, yuck... :-) OTOH there are already a few define_peephole2's in arm.md to catch this kind of simple optimization that combine fails to handle, so it's apparently acceptable. And there is no easy way around this otherwise. I can't think of a way to teach combine when it maybe could be should be to apply combine transformations to likely-spilled regs.
...remove accidental CC-list additions...
I can't see how it would hurt to allow combine to always merge insns that are known to be consecutive (ie to ignore CLASS_LIKELY_SPILLED_P if prev_nonenote_insn(consumer) == producer).
(In reply to comment #6) > I can't see how it would hurt to allow combine to always merge insns that are > known to be consecutive (ie to ignore CLASS_LIKELY_SPILLED_P if > prev_nonenote_insn(consumer) == producer). > I implemented the following patch Index: combine.c =================================================================== --- combine.c (revision 158539) +++ combine.c (working copy) @@ -384,6 +384,7 @@ static void do_SUBST_INT (int *, int); static void init_reg_last (void); static void setup_incoming_promotions (rtx); static void set_nonzero_bits_and_sign_copies (rtx, const_rtx, void *); +static bool reg_likely_spilled_p (rtx, rtx, bool); static int cant_combine_insn_p (rtx); static int can_combine_p (rtx, rtx, rtx, rtx, rtx *, rtx *); static int combinable_i3pat (rtx, rtx *, rtx, rtx, int, rtx *); @@ -1987,6 +1988,22 @@ contains_muldiv (rtx x) } } + +static bool +reg_likely_spilled_p (rtx insn, rtx reg, bool reg_set) +{ + unsigned regno = REGNO (reg); + if (!reg_set) + { + rtx prev_insn = prev_nonnote_insn_bb (insn); + if ((prev_insn != NULL_RTX) && df_reg_defined (prev_insn, reg)) + return false; + } + + return ((regno < FIRST_PSEUDO_REGISTER) && !fixed_regs[regno] + && CLASS_LIKELY_SPILLED_P (REGNO_REG_CLASS (regno))); +} + /* Determine whether INSN can be used in a combination. Return nonzero if not. This is used in try_combine to detect early some cases where we can't perform combinations. */ @@ -2020,12 +2037,8 @@ cant_combine_insn_p (rtx insn) if (GET_CODE (dest) == SUBREG) dest = SUBREG_REG (dest); if (REG_P (src) && REG_P (dest) - && ((REGNO (src) < FIRST_PSEUDO_REGISTER - && ! fixed_regs[REGNO (src)] - && CLASS_LIKELY_SPILLED_P (REGNO_REG_CLASS (REGNO (src)))) - || (REGNO (dest) < FIRST_PSEUDO_REGISTER - && ! fixed_regs[REGNO (dest)] - && CLASS_LIKELY_SPILLED_P (REGNO_REG_CLASS (REGNO (dest)))))) + && (reg_likely_spilled_p (insn, src, false) + || reg_likely_spilled_p (insn, dest, true))) return 1; return 0; It can indeed combine the mov and cmp instructions. But it causes an x86 failure. Compile the following code with options -march=x86-64 -O2 extern long xxx (long); int gen_type (long x, long y) { int size = (xxx (x) / xxx (y)); return size; } GCC generates: c-aux-info.i:7:1: error: unable to find a register to spill in class 'AREG' c-aux-info.i:7:1: error: this is the insn: (insn 13 12 19 2 c-aux-info.i:5 (parallel [ (set (reg:DI 0 ax [67]) (div:DI (reg:DI 3 bx [orig:58 D.2722 ] [58]) (reg:DI 0 ax))) (set (reg:DI 1 dx [68]) (mod:DI (reg:DI 3 bx [orig:58 D.2722 ] [58]) (reg:DI 0 ax))) (clobber (reg:CC 17 flags)) ]) 353 {*divmoddi4} (expr_list:REG_DEAD (reg:DI 3 bx [orig:58 D.2722 ] [58]) (expr_list:REG_DEAD (reg:DI 0 ax) (expr_list:REG_UNUSED (reg:DI 1 dx [68]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))))) c-aux-info.i:7:1: internal compiler error: in spill_failure, at reload1.c:2158 The root cause is with this patch, the original code sequence (insn 12 11 13 2 c-aux-info.i:5 (set (reg:DI 59 [ D.2723 ]) (reg:DI 0 ax)) 89 {*movdi_1_rex64} (expr_list:REG_DEAD (reg:DI 0 ax) (nil))) (insn 13 12 19 2 c-aux-info.i:5 (parallel [ (set (reg:DI 67) (div:DI (reg:DI 58 [ D.2722 ]) (reg:DI 59 [ D.2723 ]))) (set (reg:DI 68) (mod:DI (reg:DI 58 [ D.2722 ]) (reg:DI 59 [ D.2723 ]))) (clobber (reg:CC 17 flags)) ]) 353 {*divmoddi4} (expr_list:REG_DEAD (reg:DI 59 [ D.2723 ]) (expr_list:REG_DEAD (reg:DI 58 [ D.2722 ]) (expr_list:REG_UNUSED (reg:DI 68) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))))) is merged into (insn 13 12 19 2 c-aux-info.i:5 (parallel [ (set (reg:DI 67) (div:DI (reg:DI 58 [ D.2722 ]) (reg:DI 0 ax))) (set (reg:DI 68) (mod:DI (reg:DI 58 [ D.2722 ]) (reg:DI 0 ax))) (clobber (reg:CC 17 flags)) ]) 353 {*divmoddi4} (expr_list:REG_DEAD (reg:DI 0 ax) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_UNUSED (reg:DI 68) (expr_list:REG_DEAD (reg:DI 58 [ D.2722 ]) (nil)))))) and it failed in register allocation.
Author: vries Date: Wed Aug 17 11:39:06 2011 New Revision: 177827 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=177827 Log: 2011-08-17 Tom de Vries <tom@codesourcery.com> PR target/43597 * gcc.target/arm/pr43597.c: New test. Added: trunk/gcc/testsuite/gcc.target/arm/pr43597.c Modified: trunk/gcc/testsuite/ChangeLog
PR43597 was fixed by http://gcc.gnu.org/viewcvs?view=revision&revision=172032. Testcase added. Closing bug.