This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH][rtl-optimization] Fix PR23726


Hi,

this patch addresses an issue that shows up when dealing with RTL expander 
sequences that yield two or more useful results.
IMO the most important class of such expander sequences are arithmetic 
operations that make 1.) calculations for modes that are larger than the 
biggest supported mode of the CPU (E.g. DImode operations on a 32 bit 
machine) and 2.) calculate a condition code at the same time. Another example 
of such two-result expanders is the divmod4 pattern. 

The expander is typically run because from the Tree representation we know 
that we need one of the two results. On the tree level we cannot now right 
now that a second useful result is possibly calculated at the same time. We 
need to do such optimizations on the RTL level by CSE.

Presently we are having a problem if we try to take benefit of such a useful 
by-product of an expanded sequence. A more detailed discussion on how, e.g. 
to recycle the condition code for CCmode targets is on 
http://gcc.gnu.org/PR23726 .

The basic idea is to make the expander insert single set instructions in the 
sequence for each of the results: A single-set insn that carries a REG_EQUAL 
note similar to the one used in libcall notes. The single set insn would be 
inserted in the hope that CSE might find the value useful but rather with the 
expecation that it would very probably be deleted.

If CSE sees these notes, it could avoid unnecessary compare insn or avoid 
unnecessary re-calculations. The example of the divmod4 pattern for avr shows 
this. 
Only the present problem is that one needs to make sure that CSE has at least 
once the possibility to see these note-carrying instructions. In the present 
optimizer setup this is not the case. The single-sets for the by-products are 
removed in the jump2 pass that immediately precedes the first CSE run because 
at that time they are trivially dead.

In order to solve this problem, this patch suggests to remove the call to 
"delete_trivially_dead_insns ()" from jump2. This should not be a very 
serious performance issue since this call occures in the cse pass that 
immediately follows.

IMO with use of such note-carrying single-set insn and with the tiny change in 
jump2,  Richard Henderson's suggested subreg-lowering pass could finally help 
to generate better code for DImode operations on 32 bit Intel architectures 
and much better code for HI, SI and DI mode operations on avr. IMO lacking 
condition-code re-use could have been the reason why Richard Henderson did 
not -so far- could find an improvement when comparing subreg lowering for 
DImode at expand with lowering by splitters after reload.

Regards,

Bjoern


2005-09-06  Bjoern Haase  <bjoern.m.haase@web.de>

        *cfgcleanup.c:
	Don't call "delete_trivially_dead_insns ()" until first cse pass.

2005-09-06  Bjoern Haase  <bjoern.m.haase@web.de>

        *config/avr/avr.md:
	Add REG_EQUAL notes to the divmod4 expanders.
Index: avr.md
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/avr/avr.md,v
retrieving revision 1.53
diff -U12 -r1.53 avr.md
--- avr.md	28 Jun 2005 19:56:02 -0000	1.53
+++ avr.md	4 Sep 2005 22:01:33 -0000
@@ -825,133 +825,342 @@
 ;;  - we get both the quotient and the remainder at no extra cost
 
 (define_expand "divmodqi4"
   [(set (reg:QI 24) (match_operand:QI 1 "register_operand" ""))
    (set (reg:QI 22) (match_operand:QI 2 "register_operand" ""))
    (parallel [(set (reg:QI 24) (div:QI (reg:QI 24) (reg:QI 22)))
 	      (set (reg:QI 25) (mod:QI (reg:QI 24) (reg:QI 22)))
 	      (clobber (reg:QI 22))
 	      (clobber (reg:QI 23))])
    (set (match_operand:QI 0 "register_operand" "") (reg:QI 24))
    (set (match_operand:QI 3 "register_operand" "") (reg:QI 25))]
   ""
-  "")
+  "
+  /* The following lines correspond *exactly* to what would have been 
+   * expanded by use of above template. Only difference is that below 
+   * register notes are added that help to implement optimizations.  */
+  rtx annotate_me;
+  rtx note;
+
+  emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (QImode,24),operands[1]));
+  emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (QImode,22),operands[2]));
+  emit (gen_rtx_PARALLEL (VOIDmode,
+   gen_rtvec (4,
+      gen_rtx_SET (VOIDmode,gen_rtx_REG (QImode,24),
+                            gen_rtx_DIV (QImode,gen_rtx_REG (QImode,24), 
+                                                gen_rtx_REG (QImode,22))), 
+      gen_rtx_SET (VOIDmode,gen_rtx_REG (QImode,25),
+                            gen_rtx_MOD (QImode,gen_rtx_REG (QImode,24),
+                                                gen_rtx_REG (QImode,22))),
+      gen_hard_reg_clobber (QImode, 22),
+      gen_hard_reg_clobber (QImode, 23))));
+
+  annotate_me = 
+      emit_insn (gen_rtx_SET (VOIDmode,operands[0],gen_rtx_REG (QImode,24)));
+  note = gen_rtx_fmt_ee (DIV,QImode,
+                         copy_rtx (operands[1]),
+                         copy_rtx (operands[2]));
+  set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+  annotate_me = 
+      emit_insn (gen_rtx_SET (VOIDmode,operands[3],gen_rtx_REG (QImode,25)));
+  note = gen_rtx_fmt_ee (MOD,QImode,
+                         copy_rtx (operands[1]),
+                         copy_rtx (operands[2]));
+  set_unique_reg_note (annotate_me, REG_EQUAL, note);
+        
+  DONE; 
+  ")
 
 (define_insn "*divmodqi4_call"
   [(set (reg:QI 24) (div:QI (reg:QI 24) (reg:QI 22)))
    (set (reg:QI 25) (mod:QI (reg:QI 24) (reg:QI 22)))
    (clobber (reg:QI 22))
    (clobber (reg:QI 23))]
   ""
   "%~call __divmodqi4"
   [(set_attr "type" "xcall")
    (set_attr "cc" "clobber")])
 
 (define_expand "udivmodqi4"
   [(set (reg:QI 24) (match_operand:QI 1 "register_operand" ""))
    (set (reg:QI 22) (match_operand:QI 2 "register_operand" ""))
    (parallel [(set (reg:QI 24) (udiv:QI (reg:QI 24) (reg:QI 22)))
 	      (set (reg:QI 25) (umod:QI (reg:QI 24) (reg:QI 22)))
 	      (clobber (reg:QI 23))])
    (set (match_operand:QI 0 "register_operand" "") (reg:QI 24))
    (set (match_operand:QI 3 "register_operand" "") (reg:QI 25))]
   ""
-  "")
+  "
+  /* The following lines correspond *exactly* to what would have been 
+   * expanded by use of above template. Only difference is that below 
+   * register notes are added that help to implement optimizations.  */
+  rtx annotate_me;
+  rtx note;
+
+  emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (QImode,24),operands[1]));
+  emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (QImode,22),operands[2]));
+  emit (gen_rtx_PARALLEL (VOIDmode,
+   gen_rtvec (3,
+      gen_rtx_SET (VOIDmode,gen_rtx_REG (QImode,24),
+                            gen_rtx_UDIV (QImode,gen_rtx_REG (QImode,24),
+                                                 gen_rtx_REG (QImode,22))),
+      gen_rtx_SET (VOIDmode,gen_rtx_REG (QImode,25),
+                            gen_rtx_UMOD (QImode,gen_rtx_REG (QImode,24),
+                                                 gen_rtx_REG (QImode,22))),
+      gen_hard_reg_clobber (QImode, 23))));
+
+  annotate_me =
+      emit_insn (gen_rtx_SET (VOIDmode,operands[0],gen_rtx_REG (QImode,24)));
+  note = gen_rtx_fmt_ee (UDIV,QImode,
+                         copy_rtx (operands[1]),
+                         copy_rtx (operands[2]));
+  set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+  annotate_me =
+      emit_insn (gen_rtx_SET (VOIDmode,operands[3],gen_rtx_REG (QImode,25)));
+  note = gen_rtx_fmt_ee (UMOD,QImode,
+                         copy_rtx (operands[1]),
+                         copy_rtx (operands[2]));
+  set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+  DONE;
+")
 
 (define_insn "*udivmodqi4_call"
   [(set (reg:QI 24) (udiv:QI (reg:QI 24) (reg:QI 22)))
    (set (reg:QI 25) (umod:QI (reg:QI 24) (reg:QI 22)))
    (clobber (reg:QI 23))]
   ""
   "%~call __udivmodqi4"
   [(set_attr "type" "xcall")
    (set_attr "cc" "clobber")])
 
 (define_expand "divmodhi4"
   [(set (reg:HI 24) (match_operand:HI 1 "register_operand" ""))
    (set (reg:HI 22) (match_operand:HI 2 "register_operand" ""))
    (parallel [(set (reg:HI 22) (div:HI (reg:HI 24) (reg:HI 22)))
 	      (set (reg:HI 24) (mod:HI (reg:HI 24) (reg:HI 22)))
 	      (clobber (reg:HI 26))
 	      (clobber (reg:QI 21))])
    (set (match_operand:HI 0 "register_operand" "") (reg:HI 22))
    (set (match_operand:HI 3 "register_operand" "") (reg:HI 24))]
   ""
-  "")
+  "
+  /* The following lines correspond *exactly* to what would have been
+   * expanded by use of above template. Only difference is that below
+   * register notes are added that help to implement optimizations.  */
+  rtx annotate_me;
+  rtx note;
+
+  emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (HImode,24),operands[1]));
+  emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (HImode,22),operands[2]));
+  emit (gen_rtx_PARALLEL (VOIDmode,
+   gen_rtvec (4,
+      gen_rtx_SET (VOIDmode,gen_rtx_REG (HImode,22),
+                            gen_rtx_DIV (HImode,gen_rtx_REG (HImode,24),
+                                                gen_rtx_REG (HImode,22))),
+      gen_rtx_SET (VOIDmode,gen_rtx_REG (HImode,24),
+                            gen_rtx_MOD (HImode,gen_rtx_REG (HImode,24),
+                                                gen_rtx_REG (HImode,22))),
+      gen_hard_reg_clobber (HImode, 26),
+      gen_hard_reg_clobber (QImode, 21))));
+
+  annotate_me =
+      emit_insn (gen_rtx_SET (VOIDmode,operands[0],gen_rtx_REG (HImode,22)));
+  note = gen_rtx_fmt_ee (DIV,HImode,
+                         copy_rtx (operands[1]),
+                         copy_rtx (operands[2]));
+  set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+  annotate_me =
+      emit_insn (gen_rtx_SET (VOIDmode,operands[3],gen_rtx_REG (HImode,24)));
+  note = gen_rtx_fmt_ee (MOD,HImode,
+                         copy_rtx (operands[1]),
+                         copy_rtx (operands[2]));
+  set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+  DONE;
+  ")
 
 (define_insn "*divmodhi4_call"
   [(set (reg:HI 22) (div:HI (reg:HI 24) (reg:HI 22)))
    (set (reg:HI 24) (mod:HI (reg:HI 24) (reg:HI 22)))
    (clobber (reg:HI 26))
    (clobber (reg:QI 21))]
   ""
   "%~call __divmodhi4"
   [(set_attr "type" "xcall")
    (set_attr "cc" "clobber")])
 
 (define_expand "udivmodhi4"
   [(set (reg:HI 24) (match_operand:HI 1 "register_operand" ""))
    (set (reg:HI 22) (match_operand:HI 2 "register_operand" ""))
    (parallel [(set (reg:HI 22) (udiv:HI (reg:HI 24) (reg:HI 22)))
 	      (set (reg:HI 24) (umod:HI (reg:HI 24) (reg:HI 22)))
 	      (clobber (reg:HI 26))
 	      (clobber (reg:QI 21))])
    (set (match_operand:HI 0 "register_operand" "") (reg:HI 22))
    (set (match_operand:HI 3 "register_operand" "") (reg:HI 24))]
   ""
-  "")
+  "
+  /* The following lines correspond *exactly* to what would have been
+   * expanded by use of above template. Only difference is that below
+   * register notes are added that help to implement optimizations.  */
+  rtx annotate_me; 
+  rtx note;
+   
+  emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (HImode,24),operands[1]));
+  emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (HImode,22),operands[2]));
+  emit (gen_rtx_PARALLEL (VOIDmode,
+   gen_rtvec (4,
+      gen_rtx_SET (VOIDmode,gen_rtx_REG (HImode,22),
+                            gen_rtx_UDIV (HImode,gen_rtx_REG (HImode,24),
+                                                 gen_rtx_REG (HImode,22))),
+      gen_rtx_SET (VOIDmode,gen_rtx_REG (HImode,24),
+                            gen_rtx_UMOD (HImode,gen_rtx_REG (HImode,24),
+                                                 gen_rtx_REG (HImode,22))),
+      gen_hard_reg_clobber (HImode, 26),
+      gen_hard_reg_clobber (QImode, 21))));
+
+  annotate_me =
+      emit_insn (gen_rtx_SET (VOIDmode,operands[0],gen_rtx_REG (HImode,22)));
+  note = gen_rtx_fmt_ee (UDIV,HImode,
+                         copy_rtx (operands[1]),
+                         copy_rtx (operands[2]));
+  set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+  annotate_me =
+      emit_insn (gen_rtx_SET (VOIDmode,operands[3],gen_rtx_REG (HImode,24)));
+  note = gen_rtx_fmt_ee (UMOD,HImode,
+                         copy_rtx (operands[1]),
+                         copy_rtx (operands[2]));
+  set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+  DONE;
+  ")
 
 (define_insn "*udivmodhi4_call"
   [(set (reg:HI 22) (udiv:HI (reg:HI 24) (reg:HI 22)))
    (set (reg:HI 24) (umod:HI (reg:HI 24) (reg:HI 22)))
    (clobber (reg:HI 26))
    (clobber (reg:QI 21))]
   ""
   "%~call __udivmodhi4"
   [(set_attr "type" "xcall")
    (set_attr "cc" "clobber")])
 
 (define_expand "divmodsi4"
   [(set (reg:SI 22) (match_operand:SI 1 "register_operand" ""))
    (set (reg:SI 18) (match_operand:SI 2 "register_operand" ""))
    (parallel [(set (reg:SI 18) (div:SI (reg:SI 22) (reg:SI 18)))
 	      (set (reg:SI 22) (mod:SI (reg:SI 22) (reg:SI 18)))
 	      (clobber (reg:HI 26))
 	      (clobber (reg:HI 30))])
    (set (match_operand:SI 0 "register_operand" "") (reg:SI 18))
    (set (match_operand:SI 3 "register_operand" "") (reg:SI 22))]
   ""
-  "")
+  "
+  /* The following lines correspond *exactly* to what would have been
+   * expanded by use of above template. Only difference is that below
+   * register notes are added that help to implement optimizations.  */
+  rtx annotate_me;
+  rtx note;
+
+  emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (SImode,22),operands[1]));
+  emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (SImode,18),operands[2]));
+  emit (gen_rtx_PARALLEL (VOIDmode,
+   gen_rtvec (4,
+      gen_rtx_SET (VOIDmode,gen_rtx_REG (SImode,18),
+                            gen_rtx_DIV (SImode,gen_rtx_REG (SImode,22),
+                                                gen_rtx_REG (SImode,18))),
+      gen_rtx_SET (VOIDmode,gen_rtx_REG (SImode,22),
+                            gen_rtx_MOD (SImode,gen_rtx_REG (SImode,22),
+                                                gen_rtx_REG (SImode,18))),
+      gen_hard_reg_clobber (HImode, 26),
+      gen_hard_reg_clobber (HImode, 30))));
+
+  annotate_me =
+      emit_insn (gen_rtx_SET (VOIDmode,operands[0],gen_rtx_REG (SImode,18)));
+  note = gen_rtx_fmt_ee (DIV,SImode,
+                         copy_rtx (operands[1]),
+                         copy_rtx (operands[2]));
+  set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+  annotate_me =
+      emit_insn (gen_rtx_SET (VOIDmode,operands[3],gen_rtx_REG (SImode,22)));
+  note = gen_rtx_fmt_ee (MOD,SImode,
+                         copy_rtx (operands[1]),
+                         copy_rtx (operands[2]));
+  set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+  DONE;
+  ")
 
 (define_insn "*divmodsi4_call"
   [(set (reg:SI 18) (div:SI (reg:SI 22) (reg:SI 18)))
    (set (reg:SI 22) (mod:SI (reg:SI 22) (reg:SI 18)))
    (clobber (reg:HI 26))
    (clobber (reg:HI 30))]
   ""
   "%~call __divmodsi4"
   [(set_attr "type" "xcall")
    (set_attr "cc" "clobber")])
 
 (define_expand "udivmodsi4"
   [(set (reg:SI 22) (match_operand:SI 1 "register_operand" ""))
    (set (reg:SI 18) (match_operand:SI 2 "register_operand" ""))
    (parallel [(set (reg:SI 18) (udiv:SI (reg:SI 22) (reg:SI 18)))
 	      (set (reg:SI 22) (umod:SI (reg:SI 22) (reg:SI 18)))
 	      (clobber (reg:HI 26))
 	      (clobber (reg:HI 30))])
    (set (match_operand:SI 0 "register_operand" "") (reg:SI 18))
    (set (match_operand:SI 3 "register_operand" "") (reg:SI 22))]
   ""
-  "")
+  "
+  /* The following lines correspond *exactly* to what would have been
+   * expanded by use of above template. Only difference is that below
+   * register notes are added that help to implement optimizations.  */
+  rtx annotate_me; 
+  rtx note;
+   
+  emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (SImode,22),operands[1]));
+  emit_insn (gen_rtx_SET (VOIDmode,gen_rtx_REG (SImode,18),operands[2]));
+  emit (gen_rtx_PARALLEL (VOIDmode,
+   gen_rtvec (4,
+      gen_rtx_SET (VOIDmode,gen_rtx_REG (SImode,18),
+                            gen_rtx_UDIV (SImode,gen_rtx_REG (SImode,22),
+                                                 gen_rtx_REG (SImode,18))),
+      gen_rtx_SET (VOIDmode,gen_rtx_REG (SImode,22),
+                            gen_rtx_UMOD (SImode,gen_rtx_REG (SImode,22),
+                                                 gen_rtx_REG (SImode,18))),
+      gen_hard_reg_clobber (HImode, 26),
+      gen_hard_reg_clobber (HImode, 30))));
+
+  annotate_me =
+      emit_insn (gen_rtx_SET (VOIDmode,operands[0],gen_rtx_REG (SImode,18)));
+  note = gen_rtx_fmt_ee (UDIV,SImode,
+                         copy_rtx (operands[1]),
+                         copy_rtx (operands[2]));
+  set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+  annotate_me =
+      emit_insn (gen_rtx_SET (VOIDmode,operands[3],gen_rtx_REG (SImode,22)));
+  note = gen_rtx_fmt_ee (UMOD,SImode,
+                         copy_rtx (operands[1]),
+                         copy_rtx (operands[2]));
+  set_unique_reg_note (annotate_me, REG_EQUAL, note);
+
+  DONE;
+  ")
 
 (define_insn "*udivmodsi4_call"
   [(set (reg:SI 18) (udiv:SI (reg:SI 22) (reg:SI 18)))
    (set (reg:SI 22) (umod:SI (reg:SI 22) (reg:SI 18)))
    (clobber (reg:HI 26))
    (clobber (reg:HI 30))]
   ""
   "%~call __udivmodsi4"
   [(set_attr "type" "xcall")
    (set_attr "cc" "clobber")])
 
 ;&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
Index: cfgcleanup.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/cfgcleanup.c,v
retrieving revision 1.151
diff -U8 -r1.151 cfgcleanup.c
--- cfgcleanup.c	29 Jul 2005 00:45:57 -0000	1.151
+++ cfgcleanup.c	6 Sep 2005 13:48:05 -0000
@@ -2156,17 +2156,16 @@
 static void
 rest_of_handle_jump2 (void)
 {
   /* Turn NOTE_INSN_EXPECTED_VALUE into REG_BR_PROB.  Do this
      before jump optimization switches branch directions.  */
   if (flag_guess_branch_prob)
     expected_value_to_br_prob ();
 
-  delete_trivially_dead_insns (get_insns (), max_reg_num ());
   reg_scan (get_insns (), max_reg_num ());
   if (dump_file)
     dump_flow_info (dump_file);
   cleanup_cfg ((optimize ? CLEANUP_EXPENSIVE : 0) | CLEANUP_PRE_LOOP
                | (flag_thread_jumps ? CLEANUP_THREADING : 0));
 
   create_loop_notes ();
 

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]