This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH]: Add CSE pass directly after expand / Add register equal notes for divmod4 for AVR


Hello,

This patch aims to target two issues:

1.) The most immediate benefit is that with it, the AVR port no longer 
generates two calls to the divmod**4 library function (** == QI,SI,HI) if 
both results (DIV and MOD) are used.
2.) It aims to prepare condition code re-use for targets that A) use 
CCmode-style condition codes and B) need to expand HI/SI/DI mode compare 
operations to a bunch of native-word-size instructions that are supported by 
the architecture.

Presently the generic expanders in optabs (and possibly define_expand 
patterns in machine descriptions) frequently implement the following method. 
For each of the results of the sequence they add a final  (set 
(result_register) (result_register)) to the expanded instruction sequence.
This final "copy to myself" then carries a register note describing it's 
contents.
Some complex expanded sequences possibly generate two or more results. E.g. 
"subDI3" frequently calculates both, "MINUS" and "COMPARE" and "divmodsi4" 
generates DIV and MOD.
For those "multi-result" sequences it appears to be useful to add two "copy to 
myself with attached register note" to the expand patterns. One for each 
result operand. Usually after expand GCC will be using only one of the 
results. E.g. no pass on the tree level will have knowledge on the properties 
of some targets that a MINUS operation will calculate a compare condition 
code as a useful by-product.

Presently, 1.) register notes are thrown away completely before any pass could 
do anything useful with this information. Also ommitting the 
"remove_unneccessary_note_insns ()" call does not help since 2.) all of the 
"set to myself with attached note" instructions announcing the unexpected but 
possibly useful "by-product" results are optimized away too early. Already 
during the first jump optimization these "set to myself" instructions are 
deleted as being useless stuff. I.e. they are eliminated before any one of 
the CSE passes has a chance to look at them.

In order to enable at least a re-use of unexpected side-effects of expanded 
sequences within basic blocks, this patch adds an additional call to CSE 
immediately after expand and before the removal of "unnecessary" notes.

After adding this first part of the patch, the second part takes benefit of 
the additional CSE: Now new "copy result to itself with attached register 
note" instructions actually are helpful. With their support gcc is able to 
take benefit of the unexpected side-products of the divmod4 library calls. So 
far gcc would generate two calls to divmodsi if both, modulo and division 
results are needed (missed optimization in present state). There used to be 
no register_equal notes for AVR's define_expands: The second half of this 
patch adds them to all of the divmod expand patterns. Since the RTL is now 
generated "by hand" instead of the RTL templates, it was helpful to give the 
"divmod**4_call" instructions names so that the automatically generated 
gen_divmod... functions could be used.

Testresults for the C language for "x86_64-unknown-linux-gnu" :

> Tests that now fail, but worked before:
> gcc.c-torture/execute/20020412-1.c execution,  -O1 
> gcc.c-torture/execute/20020412-1.c execution,  -O2 
> gcc.c-torture/execute/20020412-1.c execution,  -O3 -fomit-frame-pointer 
>gcc.c-torture/execute/20020412-1.c execution,  -O3 -fomit-frame-pointer 
-funroll-all-loops -finline-functions 
>gcc.c-torture/execute/20020412-1.c execution,  -O3 -fomit-frame-pointer 
-funroll-loops 
>gcc.c-torture/execute/20020412-1.c execution,  -O3 -g 
>gcc.c-torture/execute/20020412-1.c execution,  -Os 
>
>Tests that now work, but didn't before:
>
>gcc.c-torture/compile/20001226-1.c  -O3 -g  (test for excess errors)
>gcc.c-torture/compile/20001226-1.c  -Os  (test for excess errors)

and "avr-*-*"  (without differences).

Yours,

BjÃrn


2005-06-16 ÂBjoern Haase Â<bjoern.m.haase@web.de>

	* passes.c (rest_of_compilation): 
	Additional CSE pass directly after expand
	* config/avr/avr.md: 
	(divmodqi4): Add REG_EQUAL register notes to expanded RTL
	(udivmodqi4),(divmodhi4),(udivmodhi4),(divmodsi4),(udivmodsi4): Ditto.
	(*divmodqi4_call): rename to divmodqi4_call
	(*udivmodqi4_call): rename to udivmodqi4_call
	(*divmodhi4_call),(*udivmodhi4_call): Ditto.
	(*divmodsi4_call),(*udivmodsi4_call): Ditto.

Index: passes.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/passes.c,v
retrieving revision 2.92
diff -U9 -r2.92 passes.c
--- passes.c	9 Jun 2005 16:21:35 -0000	2.92
+++ passes.c	15 Jun 2005 21:31:51 -0000
@@ -1515,18 +1515,25 @@
 	TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (parent)) = 1;
   }
 
   /* We are now committed to emitting code for this function.  Do any
      preparation, such as emitting abstract debug info for the inline
      before it gets mangled by optimization.  */
   if (cgraph_function_possibly_inlined_p (current_function_decl))
     (*debug_hooks->outlining_inline_function) (current_function_decl);
 
+  /* RTL expanders often generate register notes in order to give hints to
+     CSE. All of them will soon be removed completely by 
+     remove_unnecessary_notes (). Let's give CSE a single chance to optimize 
+     away the most obvious common subexpressions.  */
+  if (optimize > 0)
+    rest_of_handle_cse ();
+
   /* Remove any notes we don't need.  That will make iterating
      over the instruction sequence faster, and allow the garbage
      collector to reclaim the memory used by the notes.  */
   remove_unnecessary_notes ();
 
   /* Initialize some variables used by the optimizers.  */
   init_function_for_compilation ();
 
   TREE_ASM_WRITTEN (current_function_decl) = 1;
Index: config/avr/avr.md
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/avr/avr.md,v
retrieving revision 1.51
diff -U9 -r1.51 avr.md
--- config/avr/avr.md	13 Mar 2005 10:09:53 -0000	1.51
+++ config/avr/avr.md	15 Jun 2005 21:32:03 -0000
@@ -824,63 +824,120 @@
   [(set (reg:QI 24) (match_operand:QI 1 "register_operand" ""))
    (set (reg:QI 22) (match_operand:QI 2 "register_operand" ""))
    (parallel [(set (reg:QI 24) (div:QI (reg:QI 24) (reg:QI 22)))
 	      (set (reg:QI 25) (mod:QI (reg:QI 24) (reg:QI 22)))
 	      (clobber (reg:QI 22))
 	      (clobber (reg:QI 23))])
    (set (match_operand:QI 0 "register_operand" "") (reg:QI 24))
    (set (match_operand:QI 3 "register_operand" "") (reg:QI 25))]
   ""
-  "")
+  "/* Generate RTL identical to above template with the only difference
+    * that we add register notes to the last two set insn describing the
+    * contents of the registers. This way CSE is able to use both results.  */
+   rtx insn;
+   emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (QImode,24), operands[1]));
+   emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (QImode,22), operands[2]));
+   emit_insn (gen_divmodqi4_call () );
+   insn = emit_insn (gen_rtx_SET (VOIDmode, 
+                                  operands[0], gen_rtx_REG (QImode,24)));
+   set_unique_reg_note (insn, REG_EQUAL,
+                        gen_rtx_fmt_ee (DIV, QImode,
+                                        copy_rtx (operands[1]),
+                                        copy_rtx (operands[2])));   
+   insn = emit_insn (gen_rtx_SET (VOIDmode, 
+                                  operands[3], gen_rtx_REG (QImode,25)));
+   set_unique_reg_note (insn, REG_EQUAL,
+                        gen_rtx_fmt_ee (MOD, QImode,
+                                        copy_rtx (operands[1]),
+                                        copy_rtx (operands[2])));   
+   DONE;")
 
-(define_insn "*divmodqi4_call"
+(define_insn "divmodqi4_call"
   [(set (reg:QI 24) (div:QI (reg:QI 24) (reg:QI 22)))
    (set (reg:QI 25) (mod:QI (reg:QI 24) (reg:QI 22)))
    (clobber (reg:QI 22))
    (clobber (reg:QI 23))]
   ""
   "%~call __divmodqi4"
   [(set_attr "type" "xcall")
    (set_attr "cc" "clobber")])
 
 (define_expand "udivmodqi4"
   [(set (reg:QI 24) (match_operand:QI 1 "register_operand" ""))
    (set (reg:QI 22) (match_operand:QI 2 "register_operand" ""))
    (parallel [(set (reg:QI 24) (udiv:QI (reg:QI 24) (reg:QI 22)))
 	      (set (reg:QI 25) (umod:QI (reg:QI 24) (reg:QI 22)))
 	      (clobber (reg:QI 23))])
    (set (match_operand:QI 0 "register_operand" "") (reg:QI 24))
    (set (match_operand:QI 3 "register_operand" "") (reg:QI 25))]
   ""
-  "")
+  "/* Generate RTL identical to above template with the only difference
+    * that we add register notes to the last two set insn describing the
+    * contents of the registers. This way CSE is able to use both results.  */
+   rtx insn;
+   emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (QImode,24), operands[1]));
+   emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (QImode,22), operands[2]));
+   emit_insn (gen_udivmodqi4_call () );
+   insn = emit_insn (gen_rtx_SET (VOIDmode, 
+                                  operands[0], gen_rtx_REG (QImode,24)));
+   set_unique_reg_note (insn, REG_EQUAL,
+                        gen_rtx_fmt_ee (UDIV, QImode,
+                                        copy_rtx (operands[1]),
+                                        copy_rtx (operands[2])));   
+   insn = emit_insn (gen_rtx_SET (VOIDmode, 
+                                  operands[3], gen_rtx_REG (QImode,25)));
+   set_unique_reg_note (insn, REG_EQUAL,
+                        gen_rtx_fmt_ee (UMOD, QImode,
+                                        copy_rtx (operands[1]),
+                                        copy_rtx (operands[2])));   
+   DONE;")
 
-(define_insn "*udivmodqi4_call"
+(define_insn "udivmodqi4_call"
   [(set (reg:QI 24) (udiv:QI (reg:QI 24) (reg:QI 22)))
    (set (reg:QI 25) (umod:QI (reg:QI 24) (reg:QI 22)))
    (clobber (reg:QI 23))]
   ""
   "%~call __udivmodqi4"
   [(set_attr "type" "xcall")
    (set_attr "cc" "clobber")])
 
 (define_expand "divmodhi4"
   [(set (reg:HI 24) (match_operand:HI 1 "register_operand" ""))
    (set (reg:HI 22) (match_operand:HI 2 "register_operand" ""))
    (parallel [(set (reg:HI 22) (div:HI (reg:HI 24) (reg:HI 22)))
 	      (set (reg:HI 24) (mod:HI (reg:HI 24) (reg:HI 22)))
 	      (clobber (reg:HI 26))
 	      (clobber (reg:QI 21))])
    (set (match_operand:HI 0 "register_operand" "") (reg:HI 22))
    (set (match_operand:HI 3 "register_operand" "") (reg:HI 24))]
   ""
-  "")
+  "/* Generate RTL identical to above template with the only difference
+    * that we add register notes to the last two set insn describing the
+    * contents of the registers. This way CSE is able to use both results.  */
+   rtx insn;
+   emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (HImode,24), operands[1]));
+   emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (HImode,22), operands[2]));
+   emit_insn (gen_divmodhi4_call () );
+   insn = emit_insn (gen_rtx_SET (VOIDmode, 
+                                  operands[0], gen_rtx_REG (HImode,22)));
+   set_unique_reg_note (insn, REG_EQUAL,
+                        gen_rtx_fmt_ee (DIV, HImode,
+                                        copy_rtx (operands[1]),
+                                        copy_rtx (operands[2])));   
+   insn = emit_insn (gen_rtx_SET (VOIDmode, 
+                                  operands[3], gen_rtx_REG (HImode,24)));
+   set_unique_reg_note (insn, REG_EQUAL,
+                        gen_rtx_fmt_ee (MOD, HImode,
+                                        copy_rtx (operands[1]),
+                                        copy_rtx (operands[2])));   
+   DONE;")
 
-(define_insn "*divmodhi4_call"
+(define_insn "divmodhi4_call"
   [(set (reg:HI 22) (div:HI (reg:HI 24) (reg:HI 22)))
    (set (reg:HI 24) (mod:HI (reg:HI 24) (reg:HI 22)))
    (clobber (reg:HI 26))
    (clobber (reg:QI 21))]
   ""
   "%~call __divmodhi4"
   [(set_attr "type" "xcall")
    (set_attr "cc" "clobber")])
 
@@ -888,21 +945,40 @@
   [(set (reg:HI 24) (match_operand:HI 1 "register_operand" ""))
    (set (reg:HI 22) (match_operand:HI 2 "register_operand" ""))
    (parallel [(set (reg:HI 22) (udiv:HI (reg:HI 24) (reg:HI 22)))
 	      (set (reg:HI 24) (umod:HI (reg:HI 24) (reg:HI 22)))
 	      (clobber (reg:HI 26))
 	      (clobber (reg:QI 21))])
    (set (match_operand:HI 0 "register_operand" "") (reg:HI 22))
    (set (match_operand:HI 3 "register_operand" "") (reg:HI 24))]
   ""
-  "")
+  "/* Generate RTL identical to above template with the only difference
+    * that we add register notes to the last two set insn describing the
+    * contents of the registers. This way CSE is able to use both results.  */
+   rtx insn;
+   emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (HImode,24), operands[1]));
+   emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (HImode,22), operands[2]));
+   emit_insn (gen_udivmodhi4_call () );
+   insn = emit_insn (gen_rtx_SET (VOIDmode, 
+                                  operands[0], gen_rtx_REG (HImode,22)));
+   set_unique_reg_note (insn, REG_EQUAL,
+                        gen_rtx_fmt_ee (UDIV, HImode,
+                                        copy_rtx (operands[1]),
+                                        copy_rtx (operands[2])));   
+   insn = emit_insn (gen_rtx_SET (VOIDmode, 
+                                  operands[3], gen_rtx_REG (HImode,24)));
+   set_unique_reg_note (insn, REG_EQUAL,
+                        gen_rtx_fmt_ee (UMOD, HImode,
+                                        copy_rtx (operands[1]),
+                                        copy_rtx (operands[2])));   
+   DONE;")
 
-(define_insn "*udivmodhi4_call"
+(define_insn "udivmodhi4_call"
   [(set (reg:HI 22) (udiv:HI (reg:HI 24) (reg:HI 22)))
    (set (reg:HI 24) (umod:HI (reg:HI 24) (reg:HI 22)))
    (clobber (reg:HI 26))
    (clobber (reg:QI 21))]
   ""
   "%~call __udivmodhi4"
   [(set_attr "type" "xcall")
    (set_attr "cc" "clobber")])
 
@@ -910,21 +986,40 @@
   [(set (reg:SI 22) (match_operand:SI 1 "register_operand" ""))
    (set (reg:SI 18) (match_operand:SI 2 "register_operand" ""))
    (parallel [(set (reg:SI 18) (div:SI (reg:SI 22) (reg:SI 18)))
 	      (set (reg:SI 22) (mod:SI (reg:SI 22) (reg:SI 18)))
 	      (clobber (reg:HI 26))
 	      (clobber (reg:HI 30))])
    (set (match_operand:SI 0 "register_operand" "") (reg:SI 18))
    (set (match_operand:SI 3 "register_operand" "") (reg:SI 22))]
   ""
-  "")
+  "/* Generate RTL identical to above template with the only difference
+    * that we add register notes to the last two set insn describing the
+    * contents of the registers. This way CSE is able to use both results.  */
+   rtx insn;
+   emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (SImode,22), operands[1]));
+   emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (SImode,18), operands[2]));
+   emit_insn (gen_divmodsi4_call () );
+   insn = emit_insn (gen_rtx_SET (VOIDmode, 
+                                  operands[0], gen_rtx_REG (SImode,18)));
+   set_unique_reg_note (insn, REG_EQUAL,
+                        gen_rtx_fmt_ee (DIV, SImode,
+                                        copy_rtx (operands[1]),
+                                        copy_rtx (operands[2])));   
+   insn = emit_insn (gen_rtx_SET (VOIDmode, 
+                                  operands[3], gen_rtx_REG (SImode,22)));
+   set_unique_reg_note (insn, REG_EQUAL,
+                        gen_rtx_fmt_ee (MOD, SImode,
+                                        copy_rtx (operands[1]),
+                                        copy_rtx (operands[2])));   
+   DONE;")
 
-(define_insn "*divmodsi4_call"
+(define_insn "divmodsi4_call"
   [(set (reg:SI 18) (div:SI (reg:SI 22) (reg:SI 18)))
    (set (reg:SI 22) (mod:SI (reg:SI 22) (reg:SI 18)))
    (clobber (reg:HI 26))
    (clobber (reg:HI 30))]
   ""
   "%~call __divmodsi4"
   [(set_attr "type" "xcall")
    (set_attr "cc" "clobber")])
 
@@ -932,21 +1027,40 @@
   [(set (reg:SI 22) (match_operand:SI 1 "register_operand" ""))
    (set (reg:SI 18) (match_operand:SI 2 "register_operand" ""))
    (parallel [(set (reg:SI 18) (udiv:SI (reg:SI 22) (reg:SI 18)))
 	      (set (reg:SI 22) (umod:SI (reg:SI 22) (reg:SI 18)))
 	      (clobber (reg:HI 26))
 	      (clobber (reg:HI 30))])
    (set (match_operand:SI 0 "register_operand" "") (reg:SI 18))
    (set (match_operand:SI 3 "register_operand" "") (reg:SI 22))]
   ""
-  "")
+  "/* Generate RTL identical to above template with the only difference
+    * that we add register notes to the last two set insn describing the
+    * contents of the registers. This way CSE is able to use both results.  */
+   rtx insn;
+   emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (SImode,22), operands[1]));
+   emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (SImode,18), operands[2]));
+   emit_insn (gen_udivmodsi4_call () );
+   insn = emit_insn (gen_rtx_SET (VOIDmode, 
+                                  operands[0], gen_rtx_REG (SImode,18)));
+   set_unique_reg_note (insn, REG_EQUAL,
+                        gen_rtx_fmt_ee (UDIV, SImode,
+                                        copy_rtx (operands[1]),
+                                        copy_rtx (operands[2])));   
+   insn = emit_insn (gen_rtx_SET (VOIDmode, 
+                                  operands[3], gen_rtx_REG (SImode,22)));
+   set_unique_reg_note (insn, REG_EQUAL,
+                        gen_rtx_fmt_ee (UMOD, SImode,
+                                        copy_rtx (operands[1]),
+                                        copy_rtx (operands[2])));   
+   DONE;")
 
-(define_insn "*udivmodsi4_call"
+(define_insn "udivmodsi4_call"
   [(set (reg:SI 18) (udiv:SI (reg:SI 22) (reg:SI 18)))
    (set (reg:SI 22) (umod:SI (reg:SI 22) (reg:SI 18)))
    (clobber (reg:HI 26))
    (clobber (reg:HI 30))]
   ""
   "%~call __udivmodsi4"
   [(set_attr "type" "xcall")
    (set_attr "cc" "clobber")])
 

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]