This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[RFC PATCH, i386]: Fuse compare and branch macro-ops for Core2
- From: Uros Bizjak <ubizjak at gmail dot com>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Sun, 16 Mar 2008 13:26:31 +0100
- Subject: [RFC PATCH, i386]: Fuse compare and branch macro-ops for Core2
Hello!
This experimental patch is based on the Chapter 7.5, "Macro-op fusion"
of [1], where it is stated that when a bunch of condition is satisfied,
Core2 fuses compare and branch instruction into one macro-uop. This
functionality is limited to 32bit executables only.
Attached patch increases fusion opportunities by keeping unsigned
compares together with their follow-up branch instruction, while trying
to align combined sequence to keep branch insn from crossing 16-byte
boundary.
The patch bootstraps on x86_64 (please note that in current RFC
revision, fusing is always enabled), but unfortunately, it fails _one_
gfortran testcase that exposes the problem in gcc infrastructure w.r.t.
label references [2]. (Although this problem should be extremely rare
and very hard to hit.).
The alignment numbers are not fine-tuned yet, the number of max inserted
nops should be fine tuned w.r.t. average performance cost of these nops
in the instruction stream.
In the hope that the speedup will be above the noise floor, perhaps
somebody is interested to check the effect of this patch on SPEC?
2008-03-16 Uros Bizjak <ubizjak@gmail.com>
* config/i386/i386.md ("*jcc_fused_1"): New insn pattern.
("*jcc_fused_2"): Ditto.
* config/i386/i386.c (print_operand): Handle "E" and "e" code to
print opcode suffix for fused jump insn.
[1] http://www.agner.org/optimize/microarchitecture.pdf
[2] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35604
Uros.
Index: i386/i386.md
===================================================================
--- i386/i386.md (revision 133266)
+++ i386/i386.md (working copy)
@@ -13955,6 +13955,56 @@
(const_int 2)
(const_int 6)))])
+(define_insn "*jcc_fused_1"
+ [(set (pc)
+ (if_then_else (match_operator 1 "ix86_comparison_uns_operator"
+ [(match_operand:SI 2 "nonimmediate_operand" "rm,r")
+ (match_operand:SI 3 "general_operand" "ri,mr")])
+ (label_ref (match_operand 0 "" ""))
+ (pc)))]
+ "!(TARGET_64BIT || optimize_size)
+ && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+{
+#ifdef HAVE_GAS_MAX_SKIP_P2ALIGN
+ fprintf (asm_out_file, "\t.p2align 4,,7\n");
+#endif
+
+ if (REG_P (operands[2])
+ && operands[3] == CONST0_RTX (GET_MODE (operands[3])))
+ output_asm_insn ("test{l}\t%2, %2", operands);
+ else
+ output_asm_insn ("cmp{l}\t{%3, %2|%2, %3}",operands);
+
+ return "%+j%E1\t%l0\t# fused";
+}
+ [(set_attr "type" "multi")
+ (set_attr "mode" "SI")])
+
+(define_insn "*jcc_fused_2"
+ [(set (pc)
+ (if_then_else (match_operator 1 "ix86_comparison_uns_operator"
+ [(match_operand:SI 2 "nonimmediate_operand" "rm,r")
+ (match_operand:SI 3 "general_operand" "ri,mr")])
+ (pc)
+ (label_ref (match_operand 0 "" ""))))]
+ "!(TARGET_64BIT || optimize_size)
+ && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+{
+#ifdef HAVE_GAS_MAX_SKIP_P2ALIGN
+ fprintf (asm_out_file, "\t.p2align 4,,7\n");
+#endif
+
+ if (REG_P (operands[2])
+ && operands[3] == CONST0_RTX (GET_MODE (operands[3])))
+ output_asm_insn ("test{l}\t%2, %2", operands);
+ else
+ output_asm_insn ("cmp{l}\t{%3, %2|%2, %3}",operands);
+
+ return "%+j%e1\t%l0\t# fused";
+}
+ [(set_attr "type" "multi")
+ (set_attr "mode" "SI")])
+
;; In general it is not safe to assume too much about CCmode registers,
;; so simplify-rtx stops when it sees a second one. Under certain
;; conditions this is safe on x86, so help combine not create
Index: i386/i386.c
===================================================================
--- i386/i386.c (revision 133266)
+++ i386/i386.c (working copy)
@@ -9042,6 +9042,14 @@ print_operand (FILE *file, rtx x, int co
put_condition_code (GET_CODE (x), GET_MODE (XEXP (x, 0)), 1, 1, file);
return;
+ case 'E':
+ put_condition_code (GET_CODE (x), CCmode, 0, 0, file);
+ return;
+
+ case 'e':
+ put_condition_code (GET_CODE (x), CCmode, 1, 0, file);
+ return;
+
case 'H':
/* It doesn't actually matter what mode we use here, as we're
only going to use this for printing. */