This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC PATCH, i386]: Fuse compare and branch macro-ops for Core2


Hello!

This experimental patch is based on the Chapter 7.5, "Macro-op fusion" of [1], where it is stated that when a bunch of condition is satisfied, Core2 fuses compare and branch instruction into one macro-uop. This functionality is limited to 32bit executables only.

Attached patch increases fusion opportunities by keeping unsigned compares together with their follow-up branch instruction, while trying to align combined sequence to keep branch insn from crossing 16-byte boundary.

The patch bootstraps on x86_64 (please note that in current RFC revision, fusing is always enabled), but unfortunately, it fails _one_ gfortran testcase that exposes the problem in gcc infrastructure w.r.t. label references [2]. (Although this problem should be extremely rare and very hard to hit.).

The alignment numbers are not fine-tuned yet, the number of max inserted nops should be fine tuned w.r.t. average performance cost of these nops in the instruction stream.

In the hope that the speedup will be above the noise floor, perhaps somebody is interested to check the effect of this patch on SPEC?

2008-03-16 Uros Bizjak <ubizjak@gmail.com>

       * config/i386/i386.md ("*jcc_fused_1"): New insn pattern.
       ("*jcc_fused_2"): Ditto.
       * config/i386/i386.c (print_operand): Handle "E" and "e" code  to
       print opcode suffix for fused jump insn.

[1] http://www.agner.org/optimize/microarchitecture.pdf
[2] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35604

Uros.
Index: i386/i386.md
===================================================================
--- i386/i386.md	(revision 133266)
+++ i386/i386.md	(working copy)
@@ -13955,6 +13955,56 @@
 	     (const_int 2)
 	     (const_int 6)))])
 
+(define_insn "*jcc_fused_1"
+  [(set (pc)
+	(if_then_else (match_operator 1 "ix86_comparison_uns_operator"
+			[(match_operand:SI 2 "nonimmediate_operand" "rm,r")
+			 (match_operand:SI 3 "general_operand" "ri,mr")])
+	 (label_ref (match_operand 0 "" ""))
+	 (pc)))]
+  "!(TARGET_64BIT || optimize_size)
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+{
+#ifdef HAVE_GAS_MAX_SKIP_P2ALIGN
+  fprintf (asm_out_file, "\t.p2align 4,,7\n");
+#endif
+
+  if (REG_P (operands[2])
+      && operands[3] == CONST0_RTX (GET_MODE (operands[3])))
+    output_asm_insn ("test{l}\t%2, %2", operands);
+  else
+    output_asm_insn ("cmp{l}\t{%3, %2|%2, %3}",operands);
+
+  return "%+j%E1\t%l0\t# fused";
+}
+  [(set_attr "type" "multi")
+   (set_attr "mode" "SI")])
+
+(define_insn "*jcc_fused_2"
+  [(set (pc)
+	(if_then_else (match_operator 1 "ix86_comparison_uns_operator"
+			[(match_operand:SI 2 "nonimmediate_operand" "rm,r")
+			 (match_operand:SI 3 "general_operand" "ri,mr")])
+	 (pc)
+	 (label_ref (match_operand 0 "" ""))))]
+  "!(TARGET_64BIT || optimize_size)
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+{
+#ifdef HAVE_GAS_MAX_SKIP_P2ALIGN
+  fprintf (asm_out_file, "\t.p2align 4,,7\n");
+#endif
+
+  if (REG_P (operands[2])
+      && operands[3] == CONST0_RTX (GET_MODE (operands[3])))
+    output_asm_insn ("test{l}\t%2, %2", operands);
+  else
+    output_asm_insn ("cmp{l}\t{%3, %2|%2, %3}",operands);
+
+  return "%+j%e1\t%l0\t# fused";
+}
+  [(set_attr "type" "multi")
+   (set_attr "mode" "SI")])
+
 ;; In general it is not safe to assume too much about CCmode registers,
 ;; so simplify-rtx stops when it sees a second one.  Under certain
 ;; conditions this is safe on x86, so help combine not create
Index: i386/i386.c
===================================================================
--- i386/i386.c	(revision 133266)
+++ i386/i386.c	(working copy)
@@ -9042,6 +9042,14 @@ print_operand (FILE *file, rtx x, int co
 	  put_condition_code (GET_CODE (x), GET_MODE (XEXP (x, 0)), 1, 1, file);
 	  return;
 
+	case 'E':
+	  put_condition_code (GET_CODE (x), CCmode, 0, 0, file);
+	  return;
+
+	case 'e':
+	  put_condition_code (GET_CODE (x), CCmode, 1, 0, file);
+	  return;
+
 	case 'H':
 	  /* It doesn't actually matter what mode we use here, as we're
 	     only going to use this for printing.  */

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]