This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Forward propagation on RTL (4.1?)


This patch is a reissue of the one already submitted at
http://gcc.gnu.org/ml/gcc-patches/2005-07/msg00423.html.  Steven
Bosscher offered several hints and provided test coverage and SPEC
benchmarking.

This pass does simple forward propagation and simplification when an
operand of an insn can only come from a single def.  The pass has a very
good potential of catching simplifications currently done by
inter-basic-block CSE (-fcse-follow-jumps and -fcse-skip-blocks) and
combine: however, it is much faster than inter-basic-block CSE and runs
much earlier than combine.

Unlike other attempts to supersede CSE path following, however, I did
not write a general optimization framework like GCSE: instead, looking
at what kinds of simplifications are only made by CSE, I tried to
implement them in a very generic way.  Take for example PR13724: my fix
for it was to make CSE recognize the high word of (zero_extend:DI
(reg:SI M)), and simplify it to zero.  After committing it, Roger Sayle
also tried adding the same to simplify-rtx.c, but it was ineffective.
This pass, instead, can use simplify-rtx.c much more effectively than
CSE.

To recap from the original patch submission, the pass makes two kinds of
propagation.  First, it tries to propagate the source of the def into
the use, and check if it simplifies to a constant: PR13724 is an example
that is caught by this pass.  Second, it looks for paradoxical subregs
that are actually unnecessary, which are very common on machines that
can only do word-sized operations.  Again, this is usually done by CSE.

This pass uses df.c, so it is global. However, we only do limited
analysis of available expressions: in practice, it will propagate
registers only to neighboring basic blocks, unless they have a single
constant definition.

Bootstrap and SPEC benchmarking was done with -fno-cse-follow-jumps and
-fno-cse-skip-blocks, in order to validate the patch as a replacement
for CSE path following.

Bootstrap is sped up by 2.75% on i686-pc-linux-gnu and the attached
results of SPEC benchmarks are quite encouraging.  Compilation is faster
by 2 to 5%.  On 64-bit, SPECfp is neutral in practice and SPECint gains
three points (losing much on perlbmk, but winning more on twolf).  On
32-bit, things are worse, because SPECfp loses 6 points (winning only on
wupwise, in practice) and SPECint gains 1 (thanks again to twolf, and
gcc).

The patch has been bootstrapped and regtested on powerpc-pc-linux-gnu
and powerpc-apple-darwin (by me) and on x86_64-pc-linux-gnu and
i686-pc-linux-gnu (by Steven).  Last time Steven tested x86_64 it had
regression, but I believe they were all fixed.

Even though we are in stage 3, if the concept is fine I'd love if this
patch could go in as a preview, of course not enabling the new pass by
default.  In the 4.2 stage1 timeframe, I would like to move
simplifications from combine.c to simplify-rtx.c, and look at the
performance regressions (esp. mesa and crafty could be interesting) with
the goal of finally getting rid of CSE path following.

Ok for mainline?

Paolo

:ADDPATCH RTL:
GCC was configured as: configure --enable-threads=posix 
--enable-languages="c,c++,f95"
GCC bootstrap times for 'make -j1 bootstrap && make install':
Base compiler: 2909 s
Peak compiler: 2830 s

SPECfp 64-bit
*************

Total time for base compilation: 194 s
Total time for peak compilation: 187 s


                                     Estimated                     Estimated
                   Base      Base      Base      Peak      Peak      Peak
   Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
   ------------  --------  --------  --------  --------  --------  --------
   168.wupwise       1600   184       870    *     1600   183       873    *
   171.swim          3100   447       693    *     3100   445       697    *
   172.mgrid                                 X                             X
   173.applu                                 X                             X
   177.mesa          1400   137      1025    *     1400   138      1016    *
   178.galgel                                X                             X
   179.art           2600   271       960    *     2600   269       966    *
   183.equake        1300   159       815    *     1300   159       816    *
   187.facerec                               X                             X
   188.ammp          2200   253       869    *     2200   253       871    *
   189.lucas                                 X                             X
   191.fma3d                                 X                             X
   200.sixtrack                              X                             X
   301.apsi                                  X                             X
   Est. SPECfp_base2000               866    
   Est. SPECfp2000                                                  867    

SPECint 64-bit
**************

Total time for base compilation: 199 s
Total time for peak compilation: 189 s

                                     Estimated                     Estimated
                   Base      Base      Base      Peak      Peak      Peak
   Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
   ------------  --------  --------  --------  --------  --------  --------
   164.gzip          1400   172       812    *     1400   172       815    *
   175.vpr           1400   198       708    *     1400   196       713    *
   176.gcc           1100   115       956    *     1100   114       964    *
   181.mcf           1800   437       411    *     1800   439       410    *
   186.crafty        1000    72.3    1383    *     1000    72.5    1380    *
   197.parser        1800   322       559    *     1800   321       560    *
   252.eon                                   X                             X
   253.perlbmk       1800   200       901    *     1800   205       880    *
   254.gap           1100   142       776    *     1100   140       784    *
   255.vortex                                X                             X
   256.bzip2         1500   192       782    *     1500   194       774    *
   300.twolf         3000   364       824    *     3000   347       866    *
   Est. SPECint_base2000              776    
   Est. SPECint2000                                                 779    

SPECfp 32-bit
*************

Total time for base compilation: 180 s
Total time for peak compilation: 177 s


                                     Estimated                     Estimated
                   Base      Base      Base      Peak      Peak      Peak
   Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
   ------------  --------  --------  --------  --------  --------  --------
   168.wupwise       1600   228       700    *     1600   221       723    *
   171.swim          3100   448       692    *     3100   448       693    *
   172.mgrid                                 X                             X
   173.applu                                 X                             X
   177.mesa          1400   200       700    *     1400   206       679    *
   178.galgel                                X                             X
   179.art           2600   556       468    *     2600   565       460    *
   183.equake        1300   155       840    *     1300   158       822    *
   187.facerec                               X                             X
   188.ammp          2200   306       719    *     2200   310       710    *
   189.lucas                                 X                             X
   191.fma3d                                 X                             X
   200.sixtrack                              X                             X
   301.apsi                                  X                             X
   Est. SPECfp_base2000               677    
   Est. SPECfp2000                                                  671    

SPECint 32-bit
**************

Total time for base compilation: 264 s
Total time for peak compilation: 260 s

                                     Estimated                     Estimated
                   Base      Base      Base      Peak      Peak      Peak
   Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
   ------------  --------  --------  --------  --------  --------  --------
   164.gzip          1400     189         740*     1400     189         739*
   175.vpr           1400     202         695*     1400     200         699*
   176.gcc           1100     125         877*     1100     124         884*
   181.mcf           1800     342         527*     1800     339         531*
   186.crafty        1000     102         977*     1000     104         959*
   197.parser        1800     280         643*     1800     281         640*
   252.eon                                   X                             X
   253.perlbmk       1800     186         966*     1800     186         966*
   254.gap           1100     143         769*     1100     144         766*
   255.vortex        1900     193         984*     1900     193         984*
   256.bzip2         1500     209         716*     1500     210         714*
   300.twolf         3000     295        1017*     3000     289        1038*
   Est. SPECint_base2000                  794
   Est. SPECint2000                                                     795
2005-07-07  Paolo Bonzini  <bonzini@gnu.org>

	* simplify-rtx.c (simplify_const_replace_rtx_1,
	simplify_const_replace_rtx): New.
	* rtl.h: Declare it.
	* df.c (df_local_def_available_p): Take an rtx as the
	final parameter.
	* df.h (df_local_def_available_p): Adjust prototype.
	* fwprop.c: New file.
	* timevar.def (TV_FWPROP): New.
	* tree-pass.h (pass_rtl_fwprop): New.
	* passes.c (init_optimization_passes): Add it.
	* Makefile.in (OBJS-common): Add it.
	(fwprop.o): New dependencies.
	* common.opt (-fforward-propagate): New.

Index: simplify-rtx.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/simplify-rtx.c,v
retrieving revision 1.241
diff -p -u -r1.241 simplify-rtx.c
--- simplify-rtx.c	25 Jun 2005 02:01:04 -0000	1.241
+++ simplify-rtx.c	21 Jul 2005 08:47:32 -0000
@@ -365,6 +365,170 @@ simplify_replace_rtx (rtx x, rtx old_rtx
     }
   return x;
 }
+
+/* Replace all occurrences of OLD_RTX in X with NEW_RTX and try to simplify the
+   resulting RTX.  Replace X with a new RTX if the replacement succeeded.
+   Return true if there are some of the places where we found OLD_RTX, but
+   we failed to simplify everything to a constant.  Return false if everywhere
+   OLD_RTX was found, the result collapsed to a constant.
+
+   For example, (mult:M (reg:M A) (minus:M (reg:M B) (reg:M A))) may collapse
+   to zero if replacing (reg:M B) with (reg:M A).  */
+
+static bool
+simplify_const_replace_rtx_1 (rtx *px, rtx old_rtx, rtx new_rtx)
+{
+  rtx x = *px, tem = NULL_RTX, op0, op1, op2;
+  enum rtx_code code = GET_CODE (x);
+  enum machine_mode mode = GET_MODE (x);
+  enum machine_mode op_mode;
+  bool valid = false;
+
+  /* If X is OLD_RTX, return NEW_RTX.  Otherwise, if this is an expression, try
+     to build a new expression substituting recursively.  */
+
+  if (x == old_rtx)
+    {
+      *px = new_rtx;
+      return CONSTANT_P (new_rtx);
+    }
+
+  switch (GET_RTX_CLASS (code))
+    {
+    case RTX_UNARY:
+      op0 = XEXP (x, 0);
+      op_mode = GET_MODE (op0);
+      valid = simplify_const_replace_rtx_1 (&op0, old_rtx, new_rtx);
+      if (op0 == XEXP (x, 0))
+	return true;
+      tem = simplify_gen_unary (code, mode, op0, op_mode);
+      break;
+
+    case RTX_BIN_ARITH:
+    case RTX_COMM_ARITH:
+      op0 = XEXP (x, 0);
+      op1 = XEXP (x, 1);
+      valid = simplify_const_replace_rtx_1 (&op0, old_rtx, new_rtx);
+      valid &= simplify_const_replace_rtx_1 (&op1, old_rtx, new_rtx);
+      if (op0 == XEXP (x, 0) && op1 == XEXP (x, 1))
+	return true;
+      tem = simplify_gen_binary (code, mode, op0, op1);
+      break;
+
+    case RTX_COMPARE:
+    case RTX_COMM_COMPARE:
+      op0 = XEXP (x, 0);
+      op1 = XEXP (x, 1);
+      op_mode = GET_MODE (op0) != VOIDmode ? GET_MODE (op0) : GET_MODE (op1);
+      valid = simplify_const_replace_rtx_1 (&op0, old_rtx, new_rtx);
+      valid &= simplify_const_replace_rtx_1 (&op1, old_rtx, new_rtx);
+      if (op0 == XEXP (x, 0) && op1 == XEXP (x, 1))
+	return true;
+      tem = simplify_gen_relational (code, mode, op_mode, op0, op1);
+      break;
+
+    case RTX_TERNARY:
+    case RTX_BITFIELD_OPS:
+      op0 = XEXP (x, 0);
+      op1 = XEXP (x, 1);
+      op2 = XEXP (x, 2);
+      op_mode = GET_MODE (op0);
+      valid = simplify_const_replace_rtx_1 (&op0, old_rtx, new_rtx);
+      valid &= simplify_const_replace_rtx_1 (&op1, old_rtx, new_rtx);
+      valid &= simplify_const_replace_rtx_1 (&op2, old_rtx, new_rtx);
+      if (op0 == XEXP (x, 0) && op1 == XEXP (x, 1) && op2 == XEXP (x, 2))
+	return true;
+      if (op_mode == VOIDmode)
+	op_mode = GET_MODE (op0);
+      tem = simplify_gen_ternary (code, mode, op_mode, op0, op1, op2);
+      break;
+
+    case RTX_EXTRA:
+      /* The only case we try to handle is a SUBREG.  */
+      if (code == SUBREG)
+	{
+          op0 = XEXP (x, 0);
+	  valid = simplify_const_replace_rtx_1 (&op0, old_rtx, new_rtx);
+          if (op0 == XEXP (x, 0))
+	    return true;
+	  tem = simplify_gen_subreg (GET_MODE (x), op0,
+				     GET_MODE (SUBREG_REG (x)),
+				     SUBREG_BYTE (x));
+	}
+      break;
+
+    case RTX_OBJ:
+      if (code == MEM)
+	{
+	  op0 = XEXP (x, 0);
+	  valid = simplify_const_replace_rtx_1 (&op0, old_rtx, new_rtx);
+	  /* Unless op0 is constant, do not replace anything in the MEM.  */
+          if (op0 == XEXP (x, 0) || !valid)
+	    return true;
+	  tem = replace_equiv_address_nv (x, op0);
+	}
+
+      else if (code == LO_SUM)
+	{
+          op0 = XEXP (x, 0);
+          op1 = XEXP (x, 1);
+
+	  /* The only simplification we do attempts to remove references to op0
+	     or make it constant -- in both cases, op0's invalidity will not
+	     make the result invalid.  */
+	  simplify_const_replace_rtx_1 (&op0, old_rtx, new_rtx);
+	  valid = simplify_const_replace_rtx_1 (&op1, old_rtx, new_rtx);
+          if (op0 == XEXP (x, 0) && op1 == XEXP (x, 1))
+	    return true;
+
+	  /* (lo_sum (high x) x) -> x  */
+	  if (GET_CODE (op0) == HIGH && rtx_equal_p (op0, op1))
+	    tem = op1;
+	  else if (CONSTANT_P (op0) && CONSTANT_P (op1))
+	    tem = gen_rtx_LO_SUM (mode, op0, op1);
+	  else
+	    return true;
+	}
+
+      else if (code == REG)
+	{
+	  if (rtx_equal_p (x, old_rtx))
+	    {
+              *px = new_rtx;
+              return CONSTANT_P (new_rtx);
+	    }
+	}
+      break;
+
+    default:
+      break;
+    }
+
+  /* No change, no trouble.  */
+  if (tem == NULL_RTX)
+    return true;
+
+  *px = tem;
+
+  /* The replacement we made so far is valid, if all of the recursive
+     replacements were valid, or we could simplify everything to
+     a constant.  */
+  return valid || CONSTANT_P (tem);
+}
+
+/* Replace all occurrences of OLD_RTX in X with NEW_RTX and try to simplify the
+   resulting RTX.  Return a new RTX if it is a constant, otherwise X.  */
+
+rtx
+simplify_const_replace_rtx (rtx x, rtx old_rtx, rtx new_rtx)
+{
+  rtx tem = x;
+  if (simplify_const_replace_rtx_1 (&tem, old_rtx, new_rtx)
+      && tem != x)
+    return tem;
+  else
+    return NULL_RTX;
+}
 
 /* Try to simplify a unary operation CODE whose output mode is to be
    MODE with input operand OP whose mode was originally OP_MODE.
Index: rtl.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/rtl.h,v
retrieving revision 1.556
diff -p -u -r1.556 rtl.h
--- rtl.h	5 Jul 2005 16:20:14 -0000	1.556
+++ rtl.h	21 Jul 2005 08:47:32 -0000
@@ -1516,6 +1516,7 @@ extern rtx simplify_subreg (enum machine
 			    unsigned int);
 extern rtx simplify_gen_subreg (enum machine_mode, rtx, enum machine_mode,
 				unsigned int);
+extern rtx simplify_const_replace_rtx (rtx, rtx, rtx);
 extern rtx simplify_replace_rtx (rtx, rtx, rtx);
 extern rtx simplify_rtx (rtx);
 extern rtx avoid_constant_pool_reference (rtx);
Index: df.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/df.c,v
retrieving revision 1.86
diff -p -u -r1.86 df.c
--- df.c	25 Jun 2005 01:59:41 -0000	1.86
+++ df.c	21 Jul 2005 08:47:33 -0000
@@ -3256,17 +3256,16 @@ df_bb_regs_lives_compare (struct df *df,
    block as USE, is available at USE.  So DEF may as well be
    dead, in which case using it will extend its live range.  */
 bool
-df_local_def_available_p (struct df *df, struct ref *def, struct ref *use)
+df_local_def_available_p (struct df *df, struct ref *def, rtx insn)
 {
   struct df_link *link;
   int def_luid = DF_INSN_LUID (df, DF_REF_INSN (def));
   int in_bb = 0;
   unsigned int regno = REGNO (def->reg);
-  basic_block bb;
+  basic_block bb = DF_REF_BB (def);
 
   /* The regs must be local to BB.  */
-  gcc_assert (DF_REF_BB (def) == DF_REF_BB (use));
-  bb = DF_REF_BB (def);
+  gcc_assert (bb == BLOCK_FOR_INSN (insn));
 
   /* This assumes that the reg-def list is ordered such that for any
      BB, the first def is found first.  However, since the BBs are not
@@ -3280,7 +3279,7 @@ df_local_def_available_p (struct df *df,
 	  int this_luid = DF_INSN_LUID (df, DF_REF_INSN (this_def));
 	  /* Do nothing with defs coming before DEF.  */
 	  if (this_luid > def_luid)
-	    return this_luid > DF_INSN_LUID (df, DF_REF_INSN (use));
+	    return this_luid > DF_INSN_LUID (df, insn);
 
 	  in_bb = 1;
         }
Index: df.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/df.h,v
retrieving revision 1.34
diff -p -u -r1.34 df.h
--- df.h	25 Jun 2005 01:59:41 -0000	1.34
+++ df.h	21 Jul 2005 08:47:33 -0000
@@ -281,7 +281,7 @@ extern int df_bb_reg_live_end_p (struct 
 
 extern int df_bb_regs_lives_compare (struct df *, basic_block, rtx, rtx);
 
-extern bool df_local_def_available_p (struct df *, struct ref *, struct ref *);
+extern bool df_local_def_available_p (struct df *, struct ref *, rtx);
 
 extern rtx df_bb_single_def_use_insn_find (struct df *, basic_block, rtx,
 					   rtx);
Index: fwprop.c
===================================================================
RCS file: fwprop.c
diff -N fwprop.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ fwprop.c	21 Jul 2005 08:47:33 -0000
@@ -0,0 +1,446 @@
+/* RTL-based forward propagation pass for GNU compiler.
+   Copyright (C) 2005 Free Software Foundation, Inc.
+   Contributed by Paolo Bonzini and Steven Bosscher.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 2, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING.  If not, write to the Free
+Software Foundation, 59 Temple Place - Suite 330, Boston, MA
+02111-1307, USA.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "toplev.h"
+
+#include "timevar.h"
+#include "rtl.h"
+#include "insn-config.h"
+#include "recog.h"
+#include "flags.h"
+#include "obstack.h"
+#include "basic-block.h"
+#include "output.h"
+#include "df.h"
+#include "alloc-pool.h"
+#include "tree-pass.h"
+
+/* This pass does simple forward propagation and simplification when an
+   operand of an insn can only come from a single def.  This pass uses
+   df.c, so it is global.  However, we only do limited analysis of
+   available expressions.
+
+   1) The pass tries to propagate the source of the def into the use,
+   and checks if it simplifies to a constant.  For example, the high
+   word of a (zero_extend:DI (reg:SI M)) is always zero, independent
+   of the source register.
+
+   2) The pass looks for paradoxical subregs that are actually unnecessary.
+   Things like this:
+
+     (set (reg:QI 120) (subreg:QI (reg:SI 118) 0))
+     (set (reg:QI 121) (subreg:QI (reg:SI 119) 0))
+     (set (reg:SI 122) (plus:SI (subreg:SI (reg:QI 120) 0)
+                                (subreg:SI (reg:QI 121) 0)))
+
+   are very common on machines that can only do word-sized operations.
+   We look for uses of paradoxical subregs (subreg:WIDER (reg:NARROW N) 0).
+   If they have a single def that is a subreg (subref:NARROW (reg:WIDE M) 0),
+   we can do the optimization and replace the paradoxical subreg with
+   (reg:WIDE M).  In other words, we can simplify this to
+
+     (set (reg:QI 120) (subreg:QI (reg:SI 118) 0))
+     (set (reg:QI 121) (subreg:QI (reg:SI 119) 0))
+     (set (reg:SI 122) (plus:SI (reg:SI 118) (reg:SI 119)))
+
+   where the first two insns are now dead.  */
+
+
+struct fwprop_change {
+  rtx insn;
+  rtx *loc;
+  rtx subst;
+  struct fwprop_change *next;
+};
+
+static struct fwprop_change *changes = NULL;
+static alloc_pool pool = NULL;
+static int num_changes = 0;
+
+
+/* Given the head HEAD of a queue of pending changes, apply all pending
+   changes to the insn at the head of the chain.  Return the new head.  */
+
+static struct fwprop_change *
+apply_changes_one_insn (struct fwprop_change *head)
+{
+  struct fwprop_change *curr = head, *prev;
+  rtx insn = head->insn;
+  do
+    {
+      validate_change (insn, curr->loc, curr->subst, true);
+      prev = curr;
+      curr = curr->next;
+      pool_free (pool, prev);
+    }
+  while (curr && curr->insn == insn);
+  if (apply_change_group () && dump_file)
+    {
+      fprintf (dump_file, "\nChanged insn %d:\n", INSN_UID (insn));
+      print_rtl (dump_file, PATTERN (insn));
+      fprintf (dump_file, "\n");
+      num_changes++;
+    }
+  else
+    {
+      if (dump_file)
+	fprintf (dump_file,
+		 "\nChanges to insn %d not recognized\n",
+		 INSN_UID (insn));
+    }
+
+  return curr;
+}
+
+/* Queue a substitution of SUBST into LOC in INSN.  */
+
+static void
+save_change (rtx *loc, rtx insn, rtx subst)
+{
+  struct fwprop_change *new = pool_alloc (pool);
+  new->insn = insn;
+  new->loc = loc;
+  new->subst = subst;
+  new->next = changes;
+  changes = new;
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "before ");
+      print_inline_rtx (dump_file, *loc, 2);
+      fprintf (dump_file, "\nafter ");
+      print_inline_rtx (dump_file, subst, 2);
+      fprintf (dump_file, "\n\n");
+    }
+}
+
+
+/* Check if the given DEF is available in INSN.  This
+   would require full computation of available expressions;
+   we check only restricted conditions:
+   - if DEF is the sole definition of its register, go ahead
+   - in the same basic block, we check for no definitions 
+     between DEF_INSN and USE_INSN
+   - if USE's basic block has DEF's basic block as the sole
+     predecessor, we check for no definitions after DEF_INSN
+     and before USE_INSN. */
+
+static bool
+def_available_p (struct df *df, struct ref *def, rtx insn)
+{
+  basic_block bb;
+  unsigned int regno;
+
+  if (!REG_P (DF_REF_REG (def)))
+    return false;
+
+  /* Check if DEF is the sole definition of its pseudo.  */
+  regno = DF_REF_REGNO (def);
+  if (df->regs[regno].defs
+      && df->regs[regno].defs->ref == def
+      && df->regs[regno].defs->next == NULL)
+    return true;
+
+  /* Check if we are in the same basic block.  */
+  bb = BLOCK_FOR_INSN (insn);
+  if (DF_REF_BB (def) == bb)
+    return df_local_def_available_p (df, def, insn);
+
+  /* Then, if DEF is in the sole predecessor of BB, it is
+     available at the beginning of BB.  */
+  if (single_pred_p (bb) && single_pred (bb) == DF_REF_BB (def)
+      && df_bb_regno_last_def_find (df, DF_REF_BB (def), regno) == def)
+    {
+      struct ref *def_in_bb = df_bb_regno_first_def_find (df, bb, regno);
+      return !def_in_bb
+	     || DF_INSN_LUID (df, DF_REF_INSN (def_in_bb))
+		> DF_INSN_LUID (df, insn);
+    }
+
+  return false;
+}
+
+
+/* Check if the register referenced by USE can be used in TARGET_INSN.
+   This would require full computation of available expressions;
+   we check only restricted conditions, see def_available_p.  */
+static bool
+reaching_defs_available_at (struct df *df, struct ref *use, rtx target_insn)
+{
+  struct df_link *defs;
+
+  /* Follow use-def chain to check all the defs for this use.  */
+  for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
+    if (!def_available_p (df, defs->ref, target_insn))
+      return false;
+
+  return true;
+}
+
+
+/* Check if all uses in DEF_INSN can be used in TARGET_INSN.  This
+   would require full computation of available expressions;
+   we check only restricted conditions, see def_available_p.  */
+static bool
+all_uses_available_at (struct df *df, rtx def_insn, rtx target_insn)
+{
+  struct df_link *uses;
+
+  /* Look at all the uses for the insn.  */
+  for (uses = DF_INSN_USES (df, def_insn); uses; uses = uses->next)
+    if (!reaching_defs_available_at (df, uses->ref, target_insn))
+      return false;
+
+  return true;
+}
+
+
+
+/* Find whether USE is a paradoxical subreg that can be replaced by
+   either a constant or a pseudo.  */
+
+static void
+forward_propagate_subreg (struct df *df, struct ref *use,
+			  rtx def_insn, rtx def_set)
+{
+  /* Only consider paradoxical subregs... */
+  rtx use_reg = DF_REF_REG (use);
+  rtx use_insn, src;
+
+  enum machine_mode use_mode = GET_MODE (use_reg);
+  if (GET_CODE (use_reg) != SUBREG
+      || !REG_P (SET_DEST (def_set))
+      || GET_MODE_SIZE (use_mode)
+	 <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (use_reg))))
+    return;
+
+  /* If this is a paradoxical SUBREG, we have no idea what value the
+     extra bits would have.  However, if the operand is equivalent to
+     a SUBREG whose operand is the same as our mode, and all the modes
+     are within a word, we can just use the inner operand because
+     these SUBREGs just say how to treat the register.
+
+     Similarly if we find an integer constant.  */
+  use_insn = DF_REF_INSN (use);
+  src = SET_SRC (def_set);
+  if (CONSTANT_P (src))
+    {
+      if (GET_MODE (src) == VOIDmode)
+	src = rtl_hooks.gen_lowpart_no_emit (use_mode, src);
+      else
+	gcc_assert (GET_MODE (src) == use_mode);
+ 
+      if (dump_file)
+	{
+	  fprintf (dump_file,
+		   "Use of reg %d in insn %d may fold to a constant",
+		   REGNO (use_reg), INSN_UID (use_insn));
+	  print_rtl (dump_file, src);
+	  fprintf (dump_file, "\n");
+	}
+
+      save_change (DF_REF_LOC (use), use_insn, copy_rtx (src));
+    }
+  else if (GET_CODE (src) == SUBREG
+	   && REG_P (SUBREG_REG (src))
+	   && GET_MODE (SUBREG_REG (src)) == use_mode
+	   && subreg_lowpart_p (src)
+	   && all_uses_available_at (df, def_insn, use_insn))
+    {
+      if (dump_file)
+	fprintf (dump_file,
+		 "paradoxical subreg %d in insn %d equivalent to reg %d\n",
+		 REGNO (SUBREG_REG (use_reg)), INSN_UID (use_insn),
+		 REGNO (SUBREG_REG (src)));
+
+      save_change (DF_REF_LOC (use), use_insn, SUBREG_REG (src));
+    }
+}
+
+
+/* Find whether USE can be replaced with SRC (defined in DEF_INSN) and
+   simplified.  */
+
+static void
+forward_propagate_and_simplify (struct df *df, struct ref *use,
+			        rtx def_insn, rtx def_set)
+{
+  rtx use_insn = DF_REF_INSN (use);
+  rtx use_set = single_set (use_insn);
+  rtx new, src, reg;
+
+  if (!use_set)
+    return;
+
+  /* Do not propagate into PC, CC0, etc.  */
+  if (GET_MODE (SET_DEST (use_set)) == VOIDmode)
+    return;
+
+  /* Check if the SUBREG_BYTE match between the def and use.  */
+  reg = DF_REF_REG (use);
+  if (GET_CODE (reg) == SUBREG
+      && GET_CODE (SET_DEST (def_set)) == SUBREG
+      && SUBREG_BYTE (SET_DEST (def_set)) != SUBREG_BYTE (reg))
+    return;
+
+  /* Check if the def had a subreg, but the use has the whole reg.  */
+  if (REG_P (reg) && GET_CODE (SET_DEST (def_set)) == SUBREG)
+    return;
+
+  /* Check if the use has a subreg, but the def had the whole reg.  */
+  if (GET_CODE (reg) == SUBREG && REG_P (SET_DEST (def_set)))
+    reg = SUBREG_REG (reg);
+
+  /* For constants, use simplify_replace_rtx as it is faster.  */
+  src = SET_SRC (def_set);
+  if (CONSTANT_P (src))
+    new = simplify_replace_rtx (SET_SRC (use_set), reg, copy_rtx (src));
+  else if (all_uses_available_at (df, def_insn, use_insn))
+    new = simplify_const_replace_rtx (SET_SRC (use_set), reg, copy_rtx (src));
+  else
+    new = NULL_RTX;
+
+  if (new == NULL_RTX)
+    return;
+
+  if (dump_file)
+    fprintf (dump_file, "Substituting %s of reg %d into insn %d\n",
+	     CONSTANT_P (src) ? "the constant value" : "the sole definition",
+	     REGNO (reg), INSN_UID (DF_REF_INSN (use)));
+
+  /* gen_lowpart_common cannot deal with VOIDmode entities that
+     are not CONST_INTs.  */
+  if (GET_MODE (new) == VOIDmode && GET_CODE (new) != CONST_INT)
+    return;
+
+  if (GET_MODE (new) == VOIDmode)
+    new = rtl_hooks.gen_lowpart_no_emit (GET_MODE (SET_DEST (use_set)), new);
+  else
+    gcc_assert (GET_MODE (new) == GET_MODE (SET_DEST (use_set)));
+
+  save_change (&SET_SRC (use_set), DF_REF_INSN (use), new);
+}
+
+  
+/* Given a use USE of an insn and a data flow object DF, try to
+   forward propagate the unique reaching definition of USE into
+   that insn.  */
+
+static void
+forward_propagate_into (struct df *df, struct ref *use)
+{
+  struct df_link *defs;
+  struct ref *def;
+  rtx def_insn, def_set;
+  
+  if (use->flags & DF_REF_READ_WRITE)
+    return;
+
+  /* Only consider uses that have a single definition.
+     TODO: This could be relaxed it we could detect that the two (or more
+     but that is probably not worth it) reaching definitions are similar
+     insns, e.g. both are zero or sign extensions.  */
+  defs = DF_REF_CHAIN (use);
+  if (!defs || defs->next)
+    return;
+
+  def = defs->ref;
+  if (def->flags & DF_REF_READ_WRITE)
+    return;
+
+  def_insn = DF_REF_INSN (def);
+  def_set = single_set (def_insn);
+  if (!def_set)
+    return;
+
+  forward_propagate_and_simplify (df, use, def_insn, def_set);
+  forward_propagate_subreg (df, use, def_insn, def_set);
+}
+
+/* Main entry point.  */
+
+static bool
+gate_fwprop (void)
+{
+  return optimize > 0 && flag_forward_propagate;
+}
+
+static void
+fwprop (void)
+{
+  struct df *df;
+  size_t i;
+  int df_flags = DF_UD_CHAIN | DF_RD_CHAIN | DF_EQUIV_NOTES | DF_SUBREGS;
+
+  num_changes = 0;
+  df = df_init ();
+  df_analyze (df, 0, df_flags);
+  if (dump_file)
+    df_dump (df, df_flags, dump_file);
+
+  if (df->n_uses)
+    {
+      pool = create_alloc_pool ("forward propagation",
+			        sizeof (struct fwprop_change), df->n_uses);
+
+      for (i = 0; i < df->n_uses; i++)
+        forward_propagate_into (df, df->uses[i]);
+
+      /* Update the instruction stream.  It would be very easy to keep the
+         information up-to-date, and it would allow running the pass
+         iteratively until we reach a fixed point.  */
+      while (changes)
+        changes = apply_changes_one_insn (changes);
+
+      free_alloc_pool (pool);
+      pool = NULL;
+    }
+
+  df_finish (df);
+
+  if (dump_file)
+    fprintf (dump_file,
+	     "\nNumber of successful forward propagations: %d\n\n",
+	     num_changes);
+}
+
+
+struct tree_opt_pass pass_rtl_fwprop =
+{
+  "fwprop",                             /* name */
+  gate_fwprop,				/* gate */   
+  fwprop,				/* execute */       
+  NULL,                                 /* sub */
+  NULL,                                 /* next */
+  0,                                    /* static_pass_number */
+  TV_FWPROP,                            /* tv_id */
+  0,                                    /* properties_required */
+  0,                                    /* properties_provided */
+  0,                                    /* properties_destroyed */
+  0,                                    /* todo_flags_start */
+  TODO_dump_func,                       /* todo_flags_finish */
+  0                                     /* letter */
+};
+
Index: timevar.def
===================================================================
RCS file: /cvs/gcc/gcc/gcc/timevar.def,v
retrieving revision 1.53
diff -p -u -r1.53 timevar.def
--- timevar.def	19 Jul 2005 18:45:58 -0000	1.53
+++ timevar.def	21 Jul 2005 08:47:33 -0000
@@ -123,6 +123,7 @@ DEFTIMEVAR (TV_TEMPLATE_INSTANTIATION, "
 DEFTIMEVAR (TV_EXPAND		     , "expand")
 DEFTIMEVAR (TV_VARCONST              , "varconst")
 DEFTIMEVAR (TV_JUMP                  , "jump")
+DEFTIMEVAR (TV_FWPROP                , "forward prop")
 DEFTIMEVAR (TV_CSE                   , "CSE")
 DEFTIMEVAR (TV_LOOP                  , "loop analysis")
 DEFTIMEVAR (TV_GCSE                  , "global CSE")
Index: tree-pass.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-pass.h,v
retrieving revision 2.53
diff -p -u -r2.53 tree-pass.h
--- tree-pass.h	19 Jul 2005 18:45:58 -0000	2.53
+++ tree-pass.h	21 Jul 2005 08:47:33 -0000
@@ -306,6 +306,7 @@ extern struct tree_opt_pass pass_rtl_eh;
 extern struct tree_opt_pass pass_initial_value_sets;
 extern struct tree_opt_pass pass_unshare_all_rtl;
 extern struct tree_opt_pass pass_instantiate_virtual_regs;
+extern struct tree_opt_pass pass_rtl_fwprop;
 extern struct tree_opt_pass pass_jump2;
 extern struct tree_opt_pass pass_cse;
 extern struct tree_opt_pass pass_gcse;
Index: passes.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/passes.c,v
retrieving revision 2.105
diff -p -u -r2.105 passes.c
--- passes.c	19 Jul 2005 18:45:56 -0000	2.105
+++ passes.c	21 Jul 2005 08:47:33 -0000
@@ -591,6 +591,7 @@ init_optimization_passes (void)
   NEXT_PASS (pass_initial_value_sets);
   NEXT_PASS (pass_unshare_all_rtl);
   NEXT_PASS (pass_instantiate_virtual_regs);
+  NEXT_PASS (pass_rtl_fwprop);
   NEXT_PASS (pass_jump2);
   NEXT_PASS (pass_cse);
   NEXT_PASS (pass_gcse);
Index: Makefile.in
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Makefile.in,v
retrieving revision 1.1526
diff -p -u -r1.1526 Makefile.in
--- Makefile.in	21 Jul 2005 07:24:01 -0000	1.1526
+++ Makefile.in	21 Jul 2005 08:47:33 -0000
@@ -941,7 +941,7 @@ OBJS-common = \
  dbxout.o ddg.o tree-ssa-loop-ch.o loop-invariant.o tree-ssa-loop-im.o	   \
  debug.o df.o diagnostic.o dojump.o dominance.o loop-doloop.o		   \
  dwarf2asm.o dwarf2out.o emit-rtl.o except.o explow.o loop-iv.o		   \
- expmed.o expr.o final.o flow.o fold-const.o function.o gcse.o		   \
+ expmed.o expr.o final.o flow.o fold-const.o function.o fwprop.o gcse.o	   \
  genrtl.o ggc-common.o global.o graph.o gtype-desc.o			   \
  haifa-sched.o hooks.o ifcvt.o insn-attrtab.o insn-emit.o insn-modes.o	   \
  insn-extract.o insn-opinit.o insn-output.o insn-peep.o insn-recog.o	   \
@@ -2179,6 +2179,9 @@ cse.o : cse.c $(CONFIG_H) $(SYSTEM_H) co
 web.o : web.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) function.h output.h toplev.h \
    $(DF_H) $(OBSTACK_H) timevar.h tree-pass.h
+fwprop.o : fwprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
+  toplev.h insn-config.h $(RECOG_H) $(FLAGS_H) $(OBSTACK_H) $(BASIC_BLOCK_H) \
+  output.h $(DF_H) alloc-pool.h timevar.h tree-pass.h
 gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(REGS_H) hard-reg-set.h $(FLAGS_H) real.h insn-config.h $(GGC_H) \
    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) function.h output.h toplev.h \

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]