This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH]: Add CSE pass directly after expand / Add register equal notes for divmod4 for AVR
- From: BjÃrn Haase <bjoern dot m dot haase at web dot de>
- To: gcc-patches at gcc dot gnu dot org
- Cc: Denis Chertykov ; <denisc at overta dot ru>
- Date: Thu, 16 Jun 2005 02:47:21 +0200
- Subject: [PATCH]: Add CSE pass directly after expand / Add register equal notes for divmod4 for AVR
Hello,
This patch aims to target two issues:
1.) The most immediate benefit is that with it, the AVR port no longer
generates two calls to the divmod**4 library function (** == QI,SI,HI) if
both results (DIV and MOD) are used.
2.) It aims to prepare condition code re-use for targets that A) use
CCmode-style condition codes and B) need to expand HI/SI/DI mode compare
operations to a bunch of native-word-size instructions that are supported by
the architecture.
Presently the generic expanders in optabs (and possibly define_expand
patterns in machine descriptions) frequently implement the following method.
For each of the results of the sequence they add a final (set
(result_register) (result_register)) to the expanded instruction sequence.
This final "copy to myself" then carries a register note describing it's
contents.
Some complex expanded sequences possibly generate two or more results. E.g.
"subDI3" frequently calculates both, "MINUS" and "COMPARE" and "divmodsi4"
generates DIV and MOD.
For those "multi-result" sequences it appears to be useful to add two "copy to
myself with attached register note" to the expand patterns. One for each
result operand. Usually after expand GCC will be using only one of the
results. E.g. no pass on the tree level will have knowledge on the properties
of some targets that a MINUS operation will calculate a compare condition
code as a useful by-product.
Presently, 1.) register notes are thrown away completely before any pass could
do anything useful with this information. Also ommitting the
"remove_unneccessary_note_insns ()" call does not help since 2.) all of the
"set to myself with attached note" instructions announcing the unexpected but
possibly useful "by-product" results are optimized away too early. Already
during the first jump optimization these "set to myself" instructions are
deleted as being useless stuff. I.e. they are eliminated before any one of
the CSE passes has a chance to look at them.
In order to enable at least a re-use of unexpected side-effects of expanded
sequences within basic blocks, this patch adds an additional call to CSE
immediately after expand and before the removal of "unnecessary" notes.
After adding this first part of the patch, the second part takes benefit of
the additional CSE: Now new "copy result to itself with attached register
note" instructions actually are helpful. With their support gcc is able to
take benefit of the unexpected side-products of the divmod4 library calls. So
far gcc would generate two calls to divmodsi if both, modulo and division
results are needed (missed optimization in present state). There used to be
no register_equal notes for AVR's define_expands: The second half of this
patch adds them to all of the divmod expand patterns. Since the RTL is now
generated "by hand" instead of the RTL templates, it was helpful to give the
"divmod**4_call" instructions names so that the automatically generated
gen_divmod... functions could be used.
Testresults for the C language for "x86_64-unknown-linux-gnu" :
> Tests that now fail, but worked before:
> gcc.c-torture/execute/20020412-1.c execution, -O1
> gcc.c-torture/execute/20020412-1.c execution, -O2
> gcc.c-torture/execute/20020412-1.c execution, -O3 -fomit-frame-pointer
>gcc.c-torture/execute/20020412-1.c execution, -O3 -fomit-frame-pointer
-funroll-all-loops -finline-functions
>gcc.c-torture/execute/20020412-1.c execution, -O3 -fomit-frame-pointer
-funroll-loops
>gcc.c-torture/execute/20020412-1.c execution, -O3 -g
>gcc.c-torture/execute/20020412-1.c execution, -Os
>
>Tests that now work, but didn't before:
>
>gcc.c-torture/compile/20001226-1.c -O3 -g (test for excess errors)
>gcc.c-torture/compile/20001226-1.c -Os (test for excess errors)
and "avr-*-*" (without differences).
Yours,
BjÃrn
2005-06-16 ÂBjoern Haase Â<bjoern.m.haase@web.de>
* passes.c (rest_of_compilation):
Additional CSE pass directly after expand
* config/avr/avr.md:
(divmodqi4): Add REG_EQUAL register notes to expanded RTL
(udivmodqi4),(divmodhi4),(udivmodhi4),(divmodsi4),(udivmodsi4): Ditto.
(*divmodqi4_call): rename to divmodqi4_call
(*udivmodqi4_call): rename to udivmodqi4_call
(*divmodhi4_call),(*udivmodhi4_call): Ditto.
(*divmodsi4_call),(*udivmodsi4_call): Ditto.
Index: passes.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/passes.c,v
retrieving revision 2.92
diff -U9 -r2.92 passes.c
--- passes.c 9 Jun 2005 16:21:35 -0000 2.92
+++ passes.c 15 Jun 2005 21:31:51 -0000
@@ -1515,18 +1515,25 @@
TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (parent)) = 1;
}
/* We are now committed to emitting code for this function. Do any
preparation, such as emitting abstract debug info for the inline
before it gets mangled by optimization. */
if (cgraph_function_possibly_inlined_p (current_function_decl))
(*debug_hooks->outlining_inline_function) (current_function_decl);
+ /* RTL expanders often generate register notes in order to give hints to
+ CSE. All of them will soon be removed completely by
+ remove_unnecessary_notes (). Let's give CSE a single chance to optimize
+ away the most obvious common subexpressions. */
+ if (optimize > 0)
+ rest_of_handle_cse ();
+
/* Remove any notes we don't need. That will make iterating
over the instruction sequence faster, and allow the garbage
collector to reclaim the memory used by the notes. */
remove_unnecessary_notes ();
/* Initialize some variables used by the optimizers. */
init_function_for_compilation ();
TREE_ASM_WRITTEN (current_function_decl) = 1;
Index: config/avr/avr.md
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/avr/avr.md,v
retrieving revision 1.51
diff -U9 -r1.51 avr.md
--- config/avr/avr.md 13 Mar 2005 10:09:53 -0000 1.51
+++ config/avr/avr.md 15 Jun 2005 21:32:03 -0000
@@ -824,63 +824,120 @@
[(set (reg:QI 24) (match_operand:QI 1 "register_operand" ""))
(set (reg:QI 22) (match_operand:QI 2 "register_operand" ""))
(parallel [(set (reg:QI 24) (div:QI (reg:QI 24) (reg:QI 22)))
(set (reg:QI 25) (mod:QI (reg:QI 24) (reg:QI 22)))
(clobber (reg:QI 22))
(clobber (reg:QI 23))])
(set (match_operand:QI 0 "register_operand" "") (reg:QI 24))
(set (match_operand:QI 3 "register_operand" "") (reg:QI 25))]
""
- "")
+ "/* Generate RTL identical to above template with the only difference
+ * that we add register notes to the last two set insn describing the
+ * contents of the registers. This way CSE is able to use both results. */
+ rtx insn;
+ emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (QImode,24), operands[1]));
+ emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (QImode,22), operands[2]));
+ emit_insn (gen_divmodqi4_call () );
+ insn = emit_insn (gen_rtx_SET (VOIDmode,
+ operands[0], gen_rtx_REG (QImode,24)));
+ set_unique_reg_note (insn, REG_EQUAL,
+ gen_rtx_fmt_ee (DIV, QImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2])));
+ insn = emit_insn (gen_rtx_SET (VOIDmode,
+ operands[3], gen_rtx_REG (QImode,25)));
+ set_unique_reg_note (insn, REG_EQUAL,
+ gen_rtx_fmt_ee (MOD, QImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2])));
+ DONE;")
-(define_insn "*divmodqi4_call"
+(define_insn "divmodqi4_call"
[(set (reg:QI 24) (div:QI (reg:QI 24) (reg:QI 22)))
(set (reg:QI 25) (mod:QI (reg:QI 24) (reg:QI 22)))
(clobber (reg:QI 22))
(clobber (reg:QI 23))]
""
"%~call __divmodqi4"
[(set_attr "type" "xcall")
(set_attr "cc" "clobber")])
(define_expand "udivmodqi4"
[(set (reg:QI 24) (match_operand:QI 1 "register_operand" ""))
(set (reg:QI 22) (match_operand:QI 2 "register_operand" ""))
(parallel [(set (reg:QI 24) (udiv:QI (reg:QI 24) (reg:QI 22)))
(set (reg:QI 25) (umod:QI (reg:QI 24) (reg:QI 22)))
(clobber (reg:QI 23))])
(set (match_operand:QI 0 "register_operand" "") (reg:QI 24))
(set (match_operand:QI 3 "register_operand" "") (reg:QI 25))]
""
- "")
+ "/* Generate RTL identical to above template with the only difference
+ * that we add register notes to the last two set insn describing the
+ * contents of the registers. This way CSE is able to use both results. */
+ rtx insn;
+ emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (QImode,24), operands[1]));
+ emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (QImode,22), operands[2]));
+ emit_insn (gen_udivmodqi4_call () );
+ insn = emit_insn (gen_rtx_SET (VOIDmode,
+ operands[0], gen_rtx_REG (QImode,24)));
+ set_unique_reg_note (insn, REG_EQUAL,
+ gen_rtx_fmt_ee (UDIV, QImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2])));
+ insn = emit_insn (gen_rtx_SET (VOIDmode,
+ operands[3], gen_rtx_REG (QImode,25)));
+ set_unique_reg_note (insn, REG_EQUAL,
+ gen_rtx_fmt_ee (UMOD, QImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2])));
+ DONE;")
-(define_insn "*udivmodqi4_call"
+(define_insn "udivmodqi4_call"
[(set (reg:QI 24) (udiv:QI (reg:QI 24) (reg:QI 22)))
(set (reg:QI 25) (umod:QI (reg:QI 24) (reg:QI 22)))
(clobber (reg:QI 23))]
""
"%~call __udivmodqi4"
[(set_attr "type" "xcall")
(set_attr "cc" "clobber")])
(define_expand "divmodhi4"
[(set (reg:HI 24) (match_operand:HI 1 "register_operand" ""))
(set (reg:HI 22) (match_operand:HI 2 "register_operand" ""))
(parallel [(set (reg:HI 22) (div:HI (reg:HI 24) (reg:HI 22)))
(set (reg:HI 24) (mod:HI (reg:HI 24) (reg:HI 22)))
(clobber (reg:HI 26))
(clobber (reg:QI 21))])
(set (match_operand:HI 0 "register_operand" "") (reg:HI 22))
(set (match_operand:HI 3 "register_operand" "") (reg:HI 24))]
""
- "")
+ "/* Generate RTL identical to above template with the only difference
+ * that we add register notes to the last two set insn describing the
+ * contents of the registers. This way CSE is able to use both results. */
+ rtx insn;
+ emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (HImode,24), operands[1]));
+ emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (HImode,22), operands[2]));
+ emit_insn (gen_divmodhi4_call () );
+ insn = emit_insn (gen_rtx_SET (VOIDmode,
+ operands[0], gen_rtx_REG (HImode,22)));
+ set_unique_reg_note (insn, REG_EQUAL,
+ gen_rtx_fmt_ee (DIV, HImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2])));
+ insn = emit_insn (gen_rtx_SET (VOIDmode,
+ operands[3], gen_rtx_REG (HImode,24)));
+ set_unique_reg_note (insn, REG_EQUAL,
+ gen_rtx_fmt_ee (MOD, HImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2])));
+ DONE;")
-(define_insn "*divmodhi4_call"
+(define_insn "divmodhi4_call"
[(set (reg:HI 22) (div:HI (reg:HI 24) (reg:HI 22)))
(set (reg:HI 24) (mod:HI (reg:HI 24) (reg:HI 22)))
(clobber (reg:HI 26))
(clobber (reg:QI 21))]
""
"%~call __divmodhi4"
[(set_attr "type" "xcall")
(set_attr "cc" "clobber")])
@@ -888,21 +945,40 @@
[(set (reg:HI 24) (match_operand:HI 1 "register_operand" ""))
(set (reg:HI 22) (match_operand:HI 2 "register_operand" ""))
(parallel [(set (reg:HI 22) (udiv:HI (reg:HI 24) (reg:HI 22)))
(set (reg:HI 24) (umod:HI (reg:HI 24) (reg:HI 22)))
(clobber (reg:HI 26))
(clobber (reg:QI 21))])
(set (match_operand:HI 0 "register_operand" "") (reg:HI 22))
(set (match_operand:HI 3 "register_operand" "") (reg:HI 24))]
""
- "")
+ "/* Generate RTL identical to above template with the only difference
+ * that we add register notes to the last two set insn describing the
+ * contents of the registers. This way CSE is able to use both results. */
+ rtx insn;
+ emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (HImode,24), operands[1]));
+ emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (HImode,22), operands[2]));
+ emit_insn (gen_udivmodhi4_call () );
+ insn = emit_insn (gen_rtx_SET (VOIDmode,
+ operands[0], gen_rtx_REG (HImode,22)));
+ set_unique_reg_note (insn, REG_EQUAL,
+ gen_rtx_fmt_ee (UDIV, HImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2])));
+ insn = emit_insn (gen_rtx_SET (VOIDmode,
+ operands[3], gen_rtx_REG (HImode,24)));
+ set_unique_reg_note (insn, REG_EQUAL,
+ gen_rtx_fmt_ee (UMOD, HImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2])));
+ DONE;")
-(define_insn "*udivmodhi4_call"
+(define_insn "udivmodhi4_call"
[(set (reg:HI 22) (udiv:HI (reg:HI 24) (reg:HI 22)))
(set (reg:HI 24) (umod:HI (reg:HI 24) (reg:HI 22)))
(clobber (reg:HI 26))
(clobber (reg:QI 21))]
""
"%~call __udivmodhi4"
[(set_attr "type" "xcall")
(set_attr "cc" "clobber")])
@@ -910,21 +986,40 @@
[(set (reg:SI 22) (match_operand:SI 1 "register_operand" ""))
(set (reg:SI 18) (match_operand:SI 2 "register_operand" ""))
(parallel [(set (reg:SI 18) (div:SI (reg:SI 22) (reg:SI 18)))
(set (reg:SI 22) (mod:SI (reg:SI 22) (reg:SI 18)))
(clobber (reg:HI 26))
(clobber (reg:HI 30))])
(set (match_operand:SI 0 "register_operand" "") (reg:SI 18))
(set (match_operand:SI 3 "register_operand" "") (reg:SI 22))]
""
- "")
+ "/* Generate RTL identical to above template with the only difference
+ * that we add register notes to the last two set insn describing the
+ * contents of the registers. This way CSE is able to use both results. */
+ rtx insn;
+ emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (SImode,22), operands[1]));
+ emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (SImode,18), operands[2]));
+ emit_insn (gen_divmodsi4_call () );
+ insn = emit_insn (gen_rtx_SET (VOIDmode,
+ operands[0], gen_rtx_REG (SImode,18)));
+ set_unique_reg_note (insn, REG_EQUAL,
+ gen_rtx_fmt_ee (DIV, SImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2])));
+ insn = emit_insn (gen_rtx_SET (VOIDmode,
+ operands[3], gen_rtx_REG (SImode,22)));
+ set_unique_reg_note (insn, REG_EQUAL,
+ gen_rtx_fmt_ee (MOD, SImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2])));
+ DONE;")
-(define_insn "*divmodsi4_call"
+(define_insn "divmodsi4_call"
[(set (reg:SI 18) (div:SI (reg:SI 22) (reg:SI 18)))
(set (reg:SI 22) (mod:SI (reg:SI 22) (reg:SI 18)))
(clobber (reg:HI 26))
(clobber (reg:HI 30))]
""
"%~call __divmodsi4"
[(set_attr "type" "xcall")
(set_attr "cc" "clobber")])
@@ -932,21 +1027,40 @@
[(set (reg:SI 22) (match_operand:SI 1 "register_operand" ""))
(set (reg:SI 18) (match_operand:SI 2 "register_operand" ""))
(parallel [(set (reg:SI 18) (udiv:SI (reg:SI 22) (reg:SI 18)))
(set (reg:SI 22) (umod:SI (reg:SI 22) (reg:SI 18)))
(clobber (reg:HI 26))
(clobber (reg:HI 30))])
(set (match_operand:SI 0 "register_operand" "") (reg:SI 18))
(set (match_operand:SI 3 "register_operand" "") (reg:SI 22))]
""
- "")
+ "/* Generate RTL identical to above template with the only difference
+ * that we add register notes to the last two set insn describing the
+ * contents of the registers. This way CSE is able to use both results. */
+ rtx insn;
+ emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (SImode,22), operands[1]));
+ emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (SImode,18), operands[2]));
+ emit_insn (gen_udivmodsi4_call () );
+ insn = emit_insn (gen_rtx_SET (VOIDmode,
+ operands[0], gen_rtx_REG (SImode,18)));
+ set_unique_reg_note (insn, REG_EQUAL,
+ gen_rtx_fmt_ee (UDIV, SImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2])));
+ insn = emit_insn (gen_rtx_SET (VOIDmode,
+ operands[3], gen_rtx_REG (SImode,22)));
+ set_unique_reg_note (insn, REG_EQUAL,
+ gen_rtx_fmt_ee (UMOD, SImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2])));
+ DONE;")
-(define_insn "*udivmodsi4_call"
+(define_insn "udivmodsi4_call"
[(set (reg:SI 18) (udiv:SI (reg:SI 22) (reg:SI 18)))
(set (reg:SI 22) (umod:SI (reg:SI 22) (reg:SI 18)))
(clobber (reg:HI 26))
(clobber (reg:HI 30))]
""
"%~call __udivmodsi4"
[(set_attr "type" "xcall")
(set_attr "cc" "clobber")])