Bug 43597 - Move and compare with 0 can be combined
Summary: Move and compare with 0 can be combined
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.5.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: 47562
  Show dependency treegraph
 
Reported: 2010-03-31 07:50 UTC by Carrot
Modified: 2011-08-17 11:44 UTC (History)
3 users (show)

See Also:
Host: i686-linux
Target: arm-eabi
Build: i686-linux
Known to work:
Known to fail: 4.5.0
Last reconfirmed: 2010-03-31 09:01:30


Attachments
test case (132 bytes, text/x-csrc)
2010-03-31 07:50 UTC, Carrot
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Carrot 2010-03-31 07:50:03 UTC
Compile the attached source code with options -Os -march=armv7-a -mthumb, gcc generates:

foo4:
        push    {r3, r4, r5, lr}
        bl      bar
        cmp     r0, #0     // A
        mov     r5, r0     // B
        blt     .L3
        ...

Instructions AB can be merged into one instruction   "sub r5, r0, 0"

The rtx for instructions AB are:

(insn:TI 9 7 8 src/ta.c:8 (set (reg:CC 24 cc)
        (compare:CC (reg:SI 0 r0 [orig:134 f ] [134])
            (const_int 0 [0x0]))) 220 {*arm_cmpsi_insn} (expr_list:REG_DEAD (reg:SI 0 r0 [orig:134 f ] [134])
        (nil)))

(insn 8 9 10 src/ta.c:7 (set (reg/v:SI 5 r5 [orig:134 f ] [134])
        (reg:SI 0 r0)) 658 {*thumb2_movsi_insn} (nil))

We have another insn pattern "*movsi_compare0" that can be used to represent the combined operations. Should insns AB be combined automatically?
Comment 1 Carrot 2010-03-31 07:50:48 UTC
Created attachment 20262 [details]
test case
Comment 2 Steven Bosscher 2010-03-31 14:59:10 UTC
In .final, the insns look like this:
@(insn:TI 9 7 8 t.c:8 (set (reg:CC 24 cc)
@        (compare:CC (reg:SI 0 r0 [orig:134 f ] [134])
@            (const_int 0 [0x0]))) 220 {*arm_cmpsi_insn} (expr_list:REG_DEAD (reg:SI 0 r0 [orig:134 f ] [134])
@        (nil)))
        cmp     r0, #0  @ 9     *arm_cmpsi_insn/1       [length = 4]
@(insn 8 9 10 t.c:7 (set (reg/v:SI 5 r5 [orig:134 f ] [134])
@        (reg:SI 0 r0)) 658 {*thumb2_movsi_insn} (nil))
        mov     r5, r0  @ 8     *thumb2_movsi_insn/1    [length = 4]



This, AFAICT, would match with the pattern and constraints of movsi_compare0:

;; If copying one reg to another we can set the condition codes according to
;; its value.  Such a move is common after a return from subroutine and the
;; result is being tested against zero.

(define_insn "*movsi_compare0"
  [(set (reg:CC CC_REGNUM)
        (compare:CC (match_operand:SI 1 "s_register_operand" "0,r")
                    (const_int 0)))
   (set (match_operand:SI 0 "s_register_operand" "=r,r")
        (match_dup 1))]
  "TARGET_32BIT"
  "@
   cmp%?\\t%0, #0
   sub%.\\t%0, %1, #0"
  [(set_attr "conds" "set")]
)


But this insns can probably only be formed in the .combine pass, since (to the best of my knowledge) we don't construct multiple-set insns in any other pass.

In the .combine pass, the two insns look like this (obviously reversed, as you could already suspect from the insns numbers 9 and 8 above):

(insn 8 7 9 2 t.c:7 (set (reg/v:SI 134 [ f ])
        (reg:SI 0 r0)) 658 {*thumb2_movsi_insn} (expr_list:REG_DEAD (reg:SI 0 r0)
        (nil)))

(insn 9 8 10 2 t.c:8 (set (reg:CC 24 cc)
        (compare:CC (reg/v:SI 134 [ f ])
            (const_int 0 [0x0]))) 220 {*arm_cmpsi_insn} (nil))

But combine does not even try to combine these two insns:
------------
...
insn_cost 24: 0

Trying 9 -> 10:
Failed to match this instruction:
(set (pc)
    (if_then_else (lt (reg/v:SI 134 [ f ])
            (const_int 0 [0x0]))
        (label_ref:SI 29)
        (pc)))


foo4
...
------------

Comment 3 Steven Bosscher 2010-03-31 15:15:50 UTC
Well, according to GDB, combine does try to combine in try_combine:
Breakpoint 8, try_combine (i3=0x20000000004fc8b8, i2=0x20000000004fc828, i1=0x0, new_direct_jump_p=0x60000fffffd4ae28) at ../../trunk/gcc/combine.c:2389
2389      rtx newpat, newi2pat = 0;
3: debug_rtx(i3) = (insn 9 8 10 2 t.c:8 (set (reg:CC 24 cc)
        (compare:CC (reg/v:SI 134 [ fD.1273 ])
            (const_int 0 [0x0]))) 220 {*arm_cmpsi_insn} (nil))
void
2: debug_rtx(i2) = (insn 8 7 9 2 t.c:7 (set (reg/v:SI 134 [ fD.1273 ])
        (reg:SI 0 r0)) 658 {*thumb2_movsi_insn} (expr_list:REG_DEAD (reg:SI 0 r0)
        (nil)))

But then we hit cant_combine_insn_p(i2) because r0 is LO_REGS, we're compiling for TARGET_THUMB, and so CLASS_LIKELY_SPILLED_P (REGNO_REG_CLASS (REGNO (src))) is true for src=r0 in this reg-reg move:

  if (REG_P (src) && REG_P (dest)
      && ((REGNO (src) < FIRST_PSEUDO_REGISTER
           && ! fixed_regs[REGNO (src)]
           && CLASS_LIKELY_SPILLED_P (REGNO_REG_CLASS (REGNO (src))))
          || (REGNO (dest) < FIRST_PSEUDO_REGISTER
              && ! fixed_regs[REGNO (dest)]
              && CLASS_LIKELY_SPILLED_P (REGNO_REG_CLASS (REGNO (dest))))))
    return 1;


Comment 4 Steven Bosscher 2010-03-31 22:49:54 UTC
It's not very hard to add a define_peephole2 pattern for this case also, although it's a bit of a hack.

I'm not even sure if it would handle the case cmp+mov and mov+cmp case -- does peephole2 care about the order of the insns? If so, then you'd really need even 2 new define_peephole2's. You'd almost think we need a post-reload combine, yuck... :-)

OTOH there are already a few define_peephole2's in arm.md to catch this kind of simple optimization that combine fails to handle, so it's apparently acceptable. And there is no easy way around this otherwise. I can't think of a way to teach combine when it maybe could be should be to apply combine transformations to likely-spilled regs. 
Comment 5 Steven Bosscher 2010-03-31 22:51:15 UTC
...remove accidental CC-list additions...
Comment 6 Richard Earnshaw 2010-04-17 17:15:19 UTC
I can't see how it would hurt to allow combine to always merge insns that are known to be consecutive (ie to ignore CLASS_LIKELY_SPILLED_P if prev_nonenote_insn(consumer) == producer).
Comment 7 Carrot 2010-04-22 12:26:20 UTC
(In reply to comment #6)
> I can't see how it would hurt to allow combine to always merge insns that are
> known to be consecutive (ie to ignore CLASS_LIKELY_SPILLED_P if
> prev_nonenote_insn(consumer) == producer).
> 

I implemented the following patch

Index: combine.c
===================================================================
--- combine.c   (revision 158539)
+++ combine.c   (working copy)
@@ -384,6 +384,7 @@ static void do_SUBST_INT (int *, int);
 static void init_reg_last (void);
 static void setup_incoming_promotions (rtx);
 static void set_nonzero_bits_and_sign_copies (rtx, const_rtx, void *);
+static bool reg_likely_spilled_p (rtx, rtx, bool);
 static int cant_combine_insn_p (rtx);
 static int can_combine_p (rtx, rtx, rtx, rtx, rtx *, rtx *);
 static int combinable_i3pat (rtx, rtx *, rtx, rtx, int, rtx *);
@@ -1987,6 +1988,22 @@ contains_muldiv (rtx x)
     }
 }

+
+static bool
+reg_likely_spilled_p (rtx insn, rtx reg, bool reg_set)
+{
+  unsigned regno = REGNO (reg);
+  if (!reg_set)
+    {
+      rtx prev_insn = prev_nonnote_insn_bb (insn);
+      if ((prev_insn != NULL_RTX) && df_reg_defined (prev_insn, reg))
+        return false;
+    }
+
+  return ((regno < FIRST_PSEUDO_REGISTER) && !fixed_regs[regno]
+          && CLASS_LIKELY_SPILLED_P (REGNO_REG_CLASS (regno)));
+}
+
 /* Determine whether INSN can be used in a combination.  Return nonzero if
    not.  This is used in try_combine to detect early some cases where we
    can't perform combinations.  */
@@ -2020,12 +2037,8 @@ cant_combine_insn_p (rtx insn)
   if (GET_CODE (dest) == SUBREG)
     dest = SUBREG_REG (dest);
   if (REG_P (src) && REG_P (dest)
-      && ((REGNO (src) < FIRST_PSEUDO_REGISTER
-          && ! fixed_regs[REGNO (src)]
-          && CLASS_LIKELY_SPILLED_P (REGNO_REG_CLASS (REGNO (src))))
-         || (REGNO (dest) < FIRST_PSEUDO_REGISTER
-             && ! fixed_regs[REGNO (dest)]
-             && CLASS_LIKELY_SPILLED_P (REGNO_REG_CLASS (REGNO (dest))))))
+      && (reg_likely_spilled_p (insn, src, false)
+         || reg_likely_spilled_p (insn, dest, true)))
     return 1;

   return 0;

It can indeed combine the mov and cmp instructions. But it causes an x86 failure. Compile the following code with options -march=x86-64 -O2

extern long xxx (long);
int
gen_type (long x, long y)
{
  int size = (xxx (x) / xxx (y));
  return size;
}

GCC generates:

c-aux-info.i:7:1: error: unable to find a register to spill in class 'AREG'
c-aux-info.i:7:1: error: this is the insn:
(insn 13 12 19 2 c-aux-info.i:5 (parallel [
            (set (reg:DI 0 ax [67])
                (div:DI (reg:DI 3 bx [orig:58 D.2722 ] [58])
                    (reg:DI 0 ax)))
            (set (reg:DI 1 dx [68])
                (mod:DI (reg:DI 3 bx [orig:58 D.2722 ] [58])
                    (reg:DI 0 ax)))
            (clobber (reg:CC 17 flags))
        ]) 353 {*divmoddi4} (expr_list:REG_DEAD (reg:DI 3 bx [orig:58 D.2722 ] [58])
        (expr_list:REG_DEAD (reg:DI 0 ax)
            (expr_list:REG_UNUSED (reg:DI 1 dx [68])
                (expr_list:REG_UNUSED (reg:CC 17 flags)
                    (nil))))))
c-aux-info.i:7:1: internal compiler error: in spill_failure, at reload1.c:2158

The root cause is with this patch, the original code sequence

(insn 12 11 13 2 c-aux-info.i:5 (set (reg:DI 59 [ D.2723 ])
        (reg:DI 0 ax)) 89 {*movdi_1_rex64} (expr_list:REG_DEAD (reg:DI 0 ax)
        (nil)))

(insn 13 12 19 2 c-aux-info.i:5 (parallel [
            (set (reg:DI 67)
                (div:DI (reg:DI 58 [ D.2722 ])
                    (reg:DI 59 [ D.2723 ])))
            (set (reg:DI 68)
                (mod:DI (reg:DI 58 [ D.2722 ])
                    (reg:DI 59 [ D.2723 ])))
            (clobber (reg:CC 17 flags))
        ]) 353 {*divmoddi4} (expr_list:REG_DEAD (reg:DI 59 [ D.2723 ])
        (expr_list:REG_DEAD (reg:DI 58 [ D.2722 ])
            (expr_list:REG_UNUSED (reg:DI 68)
                (expr_list:REG_UNUSED (reg:CC 17 flags)
                    (nil))))))

is merged into

(insn 13 12 19 2 c-aux-info.i:5 (parallel [
            (set (reg:DI 67)
                (div:DI (reg:DI 58 [ D.2722 ])
                    (reg:DI 0 ax)))
            (set (reg:DI 68)
                (mod:DI (reg:DI 58 [ D.2722 ])
                    (reg:DI 0 ax)))
            (clobber (reg:CC 17 flags))
        ]) 353 {*divmoddi4} (expr_list:REG_DEAD (reg:DI 0 ax)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (expr_list:REG_UNUSED (reg:DI 68)
                (expr_list:REG_DEAD (reg:DI 58 [ D.2722 ])
                    (nil))))))

and it failed in register allocation.
Comment 8 Tom de Vries 2011-08-17 11:39:10 UTC
Author: vries
Date: Wed Aug 17 11:39:06 2011
New Revision: 177827

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=177827
Log:
2011-08-17  Tom de Vries  <tom@codesourcery.com>

	PR target/43597
	* gcc.target/arm/pr43597.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/arm/pr43597.c
Modified:
    trunk/gcc/testsuite/ChangeLog
Comment 9 Tom de Vries 2011-08-17 11:44:45 UTC
PR43597 was fixed by http://gcc.gnu.org/viewcvs?view=revision&revision=172032.

Testcase added. Closing bug.