This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH][rtl-optimization] Fix PR23726
- From: BjÃrn Haase <bjoern dot m dot haase at web dot de>
- To: gcc-patches at gcc dot gnu dot org
- Cc: Denis Chertykov <denisc at overta dot ru>
- Date: Tue, 6 Sep 2005 16:28:55 +0200
- Subject: [PATCH][rtl-optimization] Fix PR23726
Hi,
this patch addresses an issue that shows up when dealing with RTL expander
sequences that yield two or more useful results.
IMO the most important class of such expander sequences are arithmetic
operations that make 1.) calculations for modes that are larger than the
biggest supported mode of the CPU (E.g. DImode operations on a 32 bit
machine) and 2.) calculate a condition code at the same time. Another example
of such two-result expanders is the divmod4 pattern.
The expander is typically run because from the Tree representation we know
that we need one of the two results. On the tree level we cannot now right
now that a second useful result is possibly calculated at the same time. We
need to do such optimizations on the RTL level by CSE.
Presently we are having a problem if we try to take benefit of such a useful
by-product of an expanded sequence. A more detailed discussion on how, e.g.
to recycle the condition code for CCmode targets is on
http://gcc.gnu.org/PR23726 .
The basic idea is to make the expander insert single set instructions in the
sequence for each of the results: A single-set insn that carries a REG_EQUAL
note similar to the one used in libcall notes. The single set insn would be
inserted in the hope that CSE might find the value useful but rather with the
expecation that it would very probably be deleted.
If CSE sees these notes, it could avoid unnecessary compare insn or avoid
unnecessary re-calculations. The example of the divmod4 pattern for avr shows
this.
Only the present problem is that one needs to make sure that CSE has at least
once the possibility to see these note-carrying instructions. In the present
optimizer setup this is not the case. The single-sets for the by-products are
removed in the jump2 pass that immediately precedes the first CSE run because
at that time they are trivially dead.
In order to solve this problem, this patch suggests to remove the call to
"delete_trivially_dead_insns ()" from jump2. This should not be a very
serious performance issue since this call occures in the cse pass that
immediately follows.
IMO with use of such note-carrying single-set insn and with the tiny change in
jump2, Richard Henderson's suggested subreg-lowering pass could finally help
to generate better code for DImode operations on 32 bit Intel architectures
and much better code for HI, SI and DI mode operations on avr. IMO lacking
condition-code re-use could have been the reason why Richard Henderson did
not -so far- could find an improvement when comparing subreg lowering for
DImode at expand with lowering by splitters after reload.
Regards,
Bjoern
2005-09-06 Bjoern Haase <bjoern.m.haase@web.de>
*cfgcleanup.c:
Don't call "delete_trivially_dead_insns ()" until first cse pass.
2005-09-06 Bjoern Haase <bjoern.m.haase@web.de>
*config/avr/avr.md:
Add REG_EQUAL notes to the divmod4 expanders.
Index: avr.md
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/avr/avr.md,v
retrieving revision 1.53
diff -U12 -r1.53 avr.md
--- avr.md 28 Jun 2005 19:56:02 -0000 1.53
+++ avr.md 4 Sep 2005 22:01:33 -0000
@@ -825,133 +825,342 @@
;; - we get both the quotient and the remainder at no extra cost
(define_expand "divmodqi4"
[(set (reg:QI 24) (match_operand:QI 1 "register_operand" ""))
(set (reg:QI 22) (match_operand:QI 2 "register_operand" ""))
(parallel [(set (reg:QI 24) (div:QI (reg:QI 24) (reg:QI 22)))
(set (reg:QI 25) (mod:QI (reg:QI 24) (reg:QI 22)))
(clobber (reg:QI 22))
(clobber (reg:QI 23))])
(set (match_operand:QI 0 "register_operand" "") (reg:QI 24))
(set (match_operand:QI 3 "register_operand" "") (reg:QI 25))]
""
- "")
+ "
+ /* The following lines correspond *exactly* to what would have been
+ * expanded by use of above template. Only difference is that below
+ * register notes are added that help to implement optimizations. */
+ rtx annotate_me;
+ rtx note;
+
+ emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (QImode,24),operands[1]));
+ emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (QImode,22),operands[2]));
+ emit (gen_rtx_PARALLEL (VOIDmode,
+ gen_rtvec (4,
+ gen_rtx_SET (VOIDmode,gen_rtx_REG (QImode,24),
+ gen_rtx_DIV (QImode,gen_rtx_REG (QImode,24),
+ gen_rtx_REG (QImode,22))),
+ gen_rtx_SET (VOIDmode,gen_rtx_REG (QImode,25),
+ gen_rtx_MOD (QImode,gen_rtx_REG (QImode,24),
+ gen_rtx_REG (QImode,22))),
+ gen_hard_reg_clobber (QImode, 22),
+ gen_hard_reg_clobber (QImode, 23))));
+
+ annotate_me =
+ emit_insn (gen_rtx_SET (VOIDmode,operands[0],gen_rtx_REG (QImode,24)));
+ note = gen_rtx_fmt_ee (DIV,QImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2]));
+ set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+ annotate_me =
+ emit_insn (gen_rtx_SET (VOIDmode,operands[3],gen_rtx_REG (QImode,25)));
+ note = gen_rtx_fmt_ee (MOD,QImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2]));
+ set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+ DONE;
+ ")
(define_insn "*divmodqi4_call"
[(set (reg:QI 24) (div:QI (reg:QI 24) (reg:QI 22)))
(set (reg:QI 25) (mod:QI (reg:QI 24) (reg:QI 22)))
(clobber (reg:QI 22))
(clobber (reg:QI 23))]
""
"%~call __divmodqi4"
[(set_attr "type" "xcall")
(set_attr "cc" "clobber")])
(define_expand "udivmodqi4"
[(set (reg:QI 24) (match_operand:QI 1 "register_operand" ""))
(set (reg:QI 22) (match_operand:QI 2 "register_operand" ""))
(parallel [(set (reg:QI 24) (udiv:QI (reg:QI 24) (reg:QI 22)))
(set (reg:QI 25) (umod:QI (reg:QI 24) (reg:QI 22)))
(clobber (reg:QI 23))])
(set (match_operand:QI 0 "register_operand" "") (reg:QI 24))
(set (match_operand:QI 3 "register_operand" "") (reg:QI 25))]
""
- "")
+ "
+ /* The following lines correspond *exactly* to what would have been
+ * expanded by use of above template. Only difference is that below
+ * register notes are added that help to implement optimizations. */
+ rtx annotate_me;
+ rtx note;
+
+ emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (QImode,24),operands[1]));
+ emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (QImode,22),operands[2]));
+ emit (gen_rtx_PARALLEL (VOIDmode,
+ gen_rtvec (3,
+ gen_rtx_SET (VOIDmode,gen_rtx_REG (QImode,24),
+ gen_rtx_UDIV (QImode,gen_rtx_REG (QImode,24),
+ gen_rtx_REG (QImode,22))),
+ gen_rtx_SET (VOIDmode,gen_rtx_REG (QImode,25),
+ gen_rtx_UMOD (QImode,gen_rtx_REG (QImode,24),
+ gen_rtx_REG (QImode,22))),
+ gen_hard_reg_clobber (QImode, 23))));
+
+ annotate_me =
+ emit_insn (gen_rtx_SET (VOIDmode,operands[0],gen_rtx_REG (QImode,24)));
+ note = gen_rtx_fmt_ee (UDIV,QImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2]));
+ set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+ annotate_me =
+ emit_insn (gen_rtx_SET (VOIDmode,operands[3],gen_rtx_REG (QImode,25)));
+ note = gen_rtx_fmt_ee (UMOD,QImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2]));
+ set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+ DONE;
+")
(define_insn "*udivmodqi4_call"
[(set (reg:QI 24) (udiv:QI (reg:QI 24) (reg:QI 22)))
(set (reg:QI 25) (umod:QI (reg:QI 24) (reg:QI 22)))
(clobber (reg:QI 23))]
""
"%~call __udivmodqi4"
[(set_attr "type" "xcall")
(set_attr "cc" "clobber")])
(define_expand "divmodhi4"
[(set (reg:HI 24) (match_operand:HI 1 "register_operand" ""))
(set (reg:HI 22) (match_operand:HI 2 "register_operand" ""))
(parallel [(set (reg:HI 22) (div:HI (reg:HI 24) (reg:HI 22)))
(set (reg:HI 24) (mod:HI (reg:HI 24) (reg:HI 22)))
(clobber (reg:HI 26))
(clobber (reg:QI 21))])
(set (match_operand:HI 0 "register_operand" "") (reg:HI 22))
(set (match_operand:HI 3 "register_operand" "") (reg:HI 24))]
""
- "")
+ "
+ /* The following lines correspond *exactly* to what would have been
+ * expanded by use of above template. Only difference is that below
+ * register notes are added that help to implement optimizations. */
+ rtx annotate_me;
+ rtx note;
+
+ emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (HImode,24),operands[1]));
+ emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (HImode,22),operands[2]));
+ emit (gen_rtx_PARALLEL (VOIDmode,
+ gen_rtvec (4,
+ gen_rtx_SET (VOIDmode,gen_rtx_REG (HImode,22),
+ gen_rtx_DIV (HImode,gen_rtx_REG (HImode,24),
+ gen_rtx_REG (HImode,22))),
+ gen_rtx_SET (VOIDmode,gen_rtx_REG (HImode,24),
+ gen_rtx_MOD (HImode,gen_rtx_REG (HImode,24),
+ gen_rtx_REG (HImode,22))),
+ gen_hard_reg_clobber (HImode, 26),
+ gen_hard_reg_clobber (QImode, 21))));
+
+ annotate_me =
+ emit_insn (gen_rtx_SET (VOIDmode,operands[0],gen_rtx_REG (HImode,22)));
+ note = gen_rtx_fmt_ee (DIV,HImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2]));
+ set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+ annotate_me =
+ emit_insn (gen_rtx_SET (VOIDmode,operands[3],gen_rtx_REG (HImode,24)));
+ note = gen_rtx_fmt_ee (MOD,HImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2]));
+ set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+ DONE;
+ ")
(define_insn "*divmodhi4_call"
[(set (reg:HI 22) (div:HI (reg:HI 24) (reg:HI 22)))
(set (reg:HI 24) (mod:HI (reg:HI 24) (reg:HI 22)))
(clobber (reg:HI 26))
(clobber (reg:QI 21))]
""
"%~call __divmodhi4"
[(set_attr "type" "xcall")
(set_attr "cc" "clobber")])
(define_expand "udivmodhi4"
[(set (reg:HI 24) (match_operand:HI 1 "register_operand" ""))
(set (reg:HI 22) (match_operand:HI 2 "register_operand" ""))
(parallel [(set (reg:HI 22) (udiv:HI (reg:HI 24) (reg:HI 22)))
(set (reg:HI 24) (umod:HI (reg:HI 24) (reg:HI 22)))
(clobber (reg:HI 26))
(clobber (reg:QI 21))])
(set (match_operand:HI 0 "register_operand" "") (reg:HI 22))
(set (match_operand:HI 3 "register_operand" "") (reg:HI 24))]
""
- "")
+ "
+ /* The following lines correspond *exactly* to what would have been
+ * expanded by use of above template. Only difference is that below
+ * register notes are added that help to implement optimizations. */
+ rtx annotate_me;
+ rtx note;
+
+ emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (HImode,24),operands[1]));
+ emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (HImode,22),operands[2]));
+ emit (gen_rtx_PARALLEL (VOIDmode,
+ gen_rtvec (4,
+ gen_rtx_SET (VOIDmode,gen_rtx_REG (HImode,22),
+ gen_rtx_UDIV (HImode,gen_rtx_REG (HImode,24),
+ gen_rtx_REG (HImode,22))),
+ gen_rtx_SET (VOIDmode,gen_rtx_REG (HImode,24),
+ gen_rtx_UMOD (HImode,gen_rtx_REG (HImode,24),
+ gen_rtx_REG (HImode,22))),
+ gen_hard_reg_clobber (HImode, 26),
+ gen_hard_reg_clobber (QImode, 21))));
+
+ annotate_me =
+ emit_insn (gen_rtx_SET (VOIDmode,operands[0],gen_rtx_REG (HImode,22)));
+ note = gen_rtx_fmt_ee (UDIV,HImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2]));
+ set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+ annotate_me =
+ emit_insn (gen_rtx_SET (VOIDmode,operands[3],gen_rtx_REG (HImode,24)));
+ note = gen_rtx_fmt_ee (UMOD,HImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2]));
+ set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+ DONE;
+ ")
(define_insn "*udivmodhi4_call"
[(set (reg:HI 22) (udiv:HI (reg:HI 24) (reg:HI 22)))
(set (reg:HI 24) (umod:HI (reg:HI 24) (reg:HI 22)))
(clobber (reg:HI 26))
(clobber (reg:QI 21))]
""
"%~call __udivmodhi4"
[(set_attr "type" "xcall")
(set_attr "cc" "clobber")])
(define_expand "divmodsi4"
[(set (reg:SI 22) (match_operand:SI 1 "register_operand" ""))
(set (reg:SI 18) (match_operand:SI 2 "register_operand" ""))
(parallel [(set (reg:SI 18) (div:SI (reg:SI 22) (reg:SI 18)))
(set (reg:SI 22) (mod:SI (reg:SI 22) (reg:SI 18)))
(clobber (reg:HI 26))
(clobber (reg:HI 30))])
(set (match_operand:SI 0 "register_operand" "") (reg:SI 18))
(set (match_operand:SI 3 "register_operand" "") (reg:SI 22))]
""
- "")
+ "
+ /* The following lines correspond *exactly* to what would have been
+ * expanded by use of above template. Only difference is that below
+ * register notes are added that help to implement optimizations. */
+ rtx annotate_me;
+ rtx note;
+
+ emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (SImode,22),operands[1]));
+ emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (SImode,18),operands[2]));
+ emit (gen_rtx_PARALLEL (VOIDmode,
+ gen_rtvec (4,
+ gen_rtx_SET (VOIDmode,gen_rtx_REG (SImode,18),
+ gen_rtx_DIV (SImode,gen_rtx_REG (SImode,22),
+ gen_rtx_REG (SImode,18))),
+ gen_rtx_SET (VOIDmode,gen_rtx_REG (SImode,22),
+ gen_rtx_MOD (SImode,gen_rtx_REG (SImode,22),
+ gen_rtx_REG (SImode,18))),
+ gen_hard_reg_clobber (HImode, 26),
+ gen_hard_reg_clobber (HImode, 30))));
+
+ annotate_me =
+ emit_insn (gen_rtx_SET (VOIDmode,operands[0],gen_rtx_REG (SImode,18)));
+ note = gen_rtx_fmt_ee (DIV,SImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2]));
+ set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+ annotate_me =
+ emit_insn (gen_rtx_SET (VOIDmode,operands[3],gen_rtx_REG (SImode,22)));
+ note = gen_rtx_fmt_ee (MOD,SImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2]));
+ set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+ DONE;
+ ")
(define_insn "*divmodsi4_call"
[(set (reg:SI 18) (div:SI (reg:SI 22) (reg:SI 18)))
(set (reg:SI 22) (mod:SI (reg:SI 22) (reg:SI 18)))
(clobber (reg:HI 26))
(clobber (reg:HI 30))]
""
"%~call __divmodsi4"
[(set_attr "type" "xcall")
(set_attr "cc" "clobber")])
(define_expand "udivmodsi4"
[(set (reg:SI 22) (match_operand:SI 1 "register_operand" ""))
(set (reg:SI 18) (match_operand:SI 2 "register_operand" ""))
(parallel [(set (reg:SI 18) (udiv:SI (reg:SI 22) (reg:SI 18)))
(set (reg:SI 22) (umod:SI (reg:SI 22) (reg:SI 18)))
(clobber (reg:HI 26))
(clobber (reg:HI 30))])
(set (match_operand:SI 0 "register_operand" "") (reg:SI 18))
(set (match_operand:SI 3 "register_operand" "") (reg:SI 22))]
""
- "")
+ "
+ /* The following lines correspond *exactly* to what would have been
+ * expanded by use of above template. Only difference is that below
+ * register notes are added that help to implement optimizations. */
+ rtx annotate_me;
+ rtx note;
+
+ emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (SImode,22),operands[1]));
+ emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (SImode,18),operands[2]));
+ emit (gen_rtx_PARALLEL (VOIDmode,
+ gen_rtvec (4,
+ gen_rtx_SET (VOIDmode,gen_rtx_REG (SImode,18),
+ gen_rtx_UDIV (SImode,gen_rtx_REG (SImode,22),
+ gen_rtx_REG (SImode,18))),
+ gen_rtx_SET (VOIDmode,gen_rtx_REG (SImode,22),
+ gen_rtx_UMOD (SImode,gen_rtx_REG (SImode,22),
+ gen_rtx_REG (SImode,18))),
+ gen_hard_reg_clobber (HImode, 26),
+ gen_hard_reg_clobber (HImode, 30))));
+
+ annotate_me =
+ emit_insn (gen_rtx_SET (VOIDmode,operands[0],gen_rtx_REG (SImode,18)));
+ note = gen_rtx_fmt_ee (UDIV,SImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2]));
+ set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+ annotate_me =
+ emit_insn (gen_rtx_SET (VOIDmode,operands[3],gen_rtx_REG (SImode,22)));
+ note = gen_rtx_fmt_ee (UMOD,SImode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2]));
+ set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+ DONE;
+ ")
(define_insn "*udivmodsi4_call"
[(set (reg:SI 18) (udiv:SI (reg:SI 22) (reg:SI 18)))
(set (reg:SI 22) (umod:SI (reg:SI 22) (reg:SI 18)))
(clobber (reg:HI 26))
(clobber (reg:HI 30))]
""
"%~call __udivmodsi4"
[(set_attr "type" "xcall")
(set_attr "cc" "clobber")])
;&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
Index: cfgcleanup.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/cfgcleanup.c,v
retrieving revision 1.151
diff -U8 -r1.151 cfgcleanup.c
--- cfgcleanup.c 29 Jul 2005 00:45:57 -0000 1.151
+++ cfgcleanup.c 6 Sep 2005 13:48:05 -0000
@@ -2156,17 +2156,16 @@
static void
rest_of_handle_jump2 (void)
{
/* Turn NOTE_INSN_EXPECTED_VALUE into REG_BR_PROB. Do this
before jump optimization switches branch directions. */
if (flag_guess_branch_prob)
expected_value_to_br_prob ();
- delete_trivially_dead_insns (get_insns (), max_reg_num ());
reg_scan (get_insns (), max_reg_num ());
if (dump_file)
dump_flow_info (dump_file);
cleanup_cfg ((optimize ? CLEANUP_EXPENSIVE : 0) | CLEANUP_PRE_LOOP
| (flag_thread_jumps ? CLEANUP_THREADING : 0));
create_loop_notes ();