This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

gcc/1532: Redundant compare instructions on i386


I've been looking at PR gcc/1532.  This PR is about some poor code
generation for the i386.  It is a regression from 2.95.3.

For a fairly simple test case, found in the PR, gcc generates code
which looks like this:

	cmpl	$1, %eax
	je	.L1
	cmpl	$1, %eax
	jle	.L15

The second cmpl instruction is useless.  The condition flags will not
have changed since the first one.

The i386 backend used to use cc0.  It now generates compares which are
assigned explicit to the flags register, and jumps which explicitly
look at the flags register.  It uses several different comparison
modes, to permit various optimizations.

The RTL for the above instructions looks like this in the 01.rtl file:

(insn 93 90 94 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg/v:SI 66 [ <anonymous> ])
            (const_int 1 [0x1]))) -1 (nil)
    (nil))

(jump_insn 94 93 95 (set (pc)
        (if_then_else (eq (reg:CCZ 17 flags)
                (const_int 0 [0x0]))
            (label_ref 24)
            (pc))) -1 (nil)
    (nil))

(insn 95 94 96 (set (reg:CCGC 17 flags)
        (compare:CCGC (reg/v:SI 66 [ <anonymous> ])
            (const_int 1 [0x1]))) -1 (nil)
    (nil))

(jump_insn 96 95 97 (set (pc)
        (if_then_else (gt (reg:CCGC 17 flags)
                (const_int 0 [0x0]))
            (label_ref 101)
            (pc))) -1 (nil)
    (nil))

Note that the comparisons are done in different modes.

My first thought was to fix CSE so that, when it sees comparisons
which are equivalent except for being in different modes, it converts
them both to the superset mode (this is machine dependent, and
typically results in CCmode), and then applies CSE.  I implemented
this in the CSE pass, but it didn't help because the above compare
instructions were in different basic blocks (obvious in retrospect).

So I implemented the same idea in GCSE.  There it solved this test
case nicely, but failed on more complex cases because it led gcc to
want to copy CCmode values around.  That can be done on the i386, but
(as far as I know) there is no simple way to do it--it has to go
through memory.  In fact, it is typically more efficient to redo the
comparison than it is to save and restore the condition codes--
certainly more efficient in a case like the above, which compares a
single register with a constant.  I didn't get this case fully
working--I would have had to implement secondary reloads and so
forth--but it didn't seem like it was going to help.  (Also, I had to
hack the GCSE pass even more than expected, because it didn't want to
work on hard registers like register 17 above.)

So I punted and wrote a quick pass in machine dependent reorg to
eliminate obvious duplicate comparison cases.  It works fine on this
test case--the resulting code is the same except that the redudant
comparison is eliminated.  But it kind of sucks because it doesn't do
any real dataflow analysis, so it will miss all but the most obvious
cases.  I append that patch below, for reference.

Since the i386 is a reasonably popular platform, I think it would be
nice if gcc avoided generating obviously lousy code.  At this point,
though, I'd like to hear any suggestions for a reasonable approach to
this problem.  The key points as I see them are:

1) Comparisons are represented with different modes for purposes of
   permitting future optimization, but are otherwise implemented by
   the same underlying instruction.

2) All comparisons set the hard register 17 (the EFLAGS register).

3) We want the compiler to eliminate obvious duplication in setting
   the comparison flags.  That is, the compiler should recognize when
   one compare instruction will produce the same result as a previous
   compare instruction.
   a) It should do this despite any mode differences.
   b) It should do this even though the result goes to a hard
      register.
   c) It should not do this in the usual GCSE way, which involves
      copying the value to a pseudo-register, because if that
      pseudo-register is used after eflags has been clobbered the
      resulting code will almost certainly be worse.

There is some other awful code in this test case which I haven't even
looked at yet, a useless jump to the next instruction:

	je	.L1
.L2:
.L1:

I assume that is a separate problem, probably related to inlining.

Ian

This is the patch I wrote which solves the problem in a poor manner.
This is not yet a serious patch submission, just an example.

Index: i386.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.c,v
retrieving revision 1.635
diff -p -u -r1.635 i386.c
--- i386.c	16 Jan 2004 18:53:44 -0000	1.635
+++ i386.c	20 Jan 2004 21:19:08 -0000
@@ -15689,6 +15689,69 @@ k8_avoid_jump_misspredicts (void)
     }
 }
 
+/* Remove duplicate setting of the condition codes.  */
+
+static void
+ix86_remove_dup_cc (void)
+{
+  rtx cc_reg;
+  rtx last_cc_insn;
+  rtx last_cc_src;
+  rtx insn;
+  rtx next;
+
+  cc_reg = gen_rtx_REG (CCmode, FLAGS_REG);
+  last_cc_insn = NULL_RTX;
+  last_cc_src = NULL_RTX;
+
+  for (insn = get_insns (); insn; insn = next)
+    {
+      rtx set;
+
+      next = NEXT_INSN (insn);
+
+      if (GET_CODE (insn) == CODE_LABEL
+	  || GET_CODE (insn) == CALL_INSN)
+	{
+	  last_cc_insn = NULL_RTX;
+	  continue;
+	}
+
+      if (! INSN_P (insn))
+	continue;
+
+      set = single_set (insn);
+      if (set
+	  && GET_CODE (SET_DEST (set)) == REG
+	  && REGNO (SET_DEST (set)) == FLAGS_REG)
+	{
+	  if (last_cc_insn
+	      && (rtx_equal_p (last_cc_src, SET_SRC (set))
+		  || (GET_CODE (SET_SRC (set)) == COMPARE
+		      && GET_CODE (last_cc_src) == COMPARE
+		      && GET_MODE (SET_SRC (set)) != GET_MODE (last_cc_src)
+		      && GET_MODE_CLASS (GET_MODE (SET_SRC (set))) == MODE_CC
+		      && GET_MODE_CLASS (GET_MODE (last_cc_src)) == MODE_CC
+		      && rtx_equal_p (XEXP (SET_SRC (set), 0),
+				      XEXP (last_cc_src, 0))
+		      && rtx_equal_p (XEXP (SET_SRC (set), 1),
+				      XEXP (last_cc_src, 1))))
+	      && ! modified_between_p (SET_SRC (set), last_cc_insn, insn))
+	    delete_insn (insn);
+	  else
+	    {
+	      last_cc_insn = insn;
+	      last_cc_src = SET_SRC (set);
+	    }
+	}
+      else
+	{
+	  if (reg_set_p (cc_reg, insn))
+	    last_cc_insn = NULL_RTX;
+	}
+    }
+}
+
 /* Implement machine specific optimizations.
    At the moment we implement single transformation: AMD Athlon works faster
    when RET is not destination of conditional jump or directly preceded
@@ -15698,6 +15761,9 @@ static void
 ix86_reorg (void)
 {
   edge e;
+
+  if (optimize)
+    ix86_remove_dup_cc ();
 
   if (!TARGET_ATHLON_K8 || !optimize || optimize_size)
     return;


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]