This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Split GIMPLE temporaries at -O0


Consider a small routine, this example is from vector<> in libstdc++:

      void
      push_back(const value_type& __x)
      { 
        if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
          { 
            this->_M_impl.construct(this->_M_impl._M_finish, __x);
            ++this->_M_impl._M_finish;
          }
        else
          _M_insert_aux(end(), __x);
      }

The expression 'this->_M_impl._M_finish' is repeated three times.  The
gimplifier would try to re-use the same temporary for each repetition.
This works great at -O1, saving memory and possibly improving
optimization opportunities.  At -O0, however, you'll notice that the
new temporary will appear in two different basic blocks, like this:

  T.5925 = this-><D153590>._M_impl._M_finish;
  T.5926 = this-><D153590>._M_impl._M_end_of_storage;
  if (T.5925 != T.5926)
    { 
      this.5927 = (struct new_allocator<TPropertyList> *)this;
      T.5925 = this-><D153590>._M_impl._M_finish;
      construct (this.5927, T.5925, __x);

At -O0, we're very limited in register allocation.  There are
basically three possibilities:

- A user variable is placed in memory by initial RTL generation.
- An artificial DECL which is used in only one basic block is
  allocated to a register by local-alloc.
- An artificial DECL used in more than one basic block is allocated to
  a memory location by reload, and then anything that used it is
  reloaded if it can't take a memory address.

On a RISC machine like powerpc, this means that a temporary like
T.5925 will be allocated to memory and create one reload for every
instruction that it's present in.  Not good for code quality, and not
good for compile speed; this resulted in a 4x increase in the number
of non-scratch reloads in a sample file when built with gcc 3.5
compared to gcc 3.3.

So, let's stop doing that.  This patch caused a speed improvement of
about 1% (not what I'd hoped for, because of the extra memory
allocation, but still an improvement) on a Finder_FE build at -O0 -g
on ppc-darwin, and a code size improvement of about 2% on one file
from that build.

I expect that eventually we'll need a GIMPLE_DECL which contains only
the handful of fields that GIMPLE actually needs in a DECL (name,
type, a bunch of bitfields), and all the other kinds of DECL will
become frontend-only constructs.

Bootstrapped & tested on powerpc-darwin.

-- 
- Geoffrey Keating <geoffk@apple.com>

===File ~/patches/gcc-speed-split.patch=====================
2004-08-24  Geoffrey Keating  <geoffk@apple.com>

	* gimplify.c (lookup_tmp_var): Separate temporaries when not
	optimizing.

Index: gimplify.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/gimplify.c,v
retrieving revision 2.66
diff -u -p -u -p -r2.66 gimplify.c
--- gimplify.c	24 Aug 2004 15:46:36 -0000	2.66
+++ gimplify.c	25 Aug 2004 17:50:38 -0000
@@ -117,8 +117,11 @@ push_gimplify_context (void)
     abort ();
   gimplify_ctxp
     = (struct gimplify_ctx *) xcalloc (1, sizeof (struct gimplify_ctx));
-  gimplify_ctxp->temp_htab
-    = htab_create (1000, gimple_tree_hash, gimple_tree_eq, free);
+  if (optimize)
+    gimplify_ctxp->temp_htab
+      = htab_create (1000, gimple_tree_hash, gimple_tree_eq, free);
+  else
+    gimplify_ctxp->temp_htab = NULL;
 }
 
 /* Tear down a context for the gimplifier.  If BODY is non-null, then
@@ -142,12 +145,13 @@ pop_gimplify_context (tree body)
     record_vars (gimplify_ctxp->temps);
 
 #if 0
-  if (!quiet_flag)
+  if (!quiet_flag && optimize)
     fprintf (stderr, " collisions: %f ",
 	     htab_collisions (gimplify_ctxp->temp_htab));
 #endif
 
-  htab_delete (gimplify_ctxp->temp_htab);
+  if (optimize)
+    htab_delete (gimplify_ctxp->temp_htab);
   free (gimplify_ctxp);
   gimplify_ctxp = NULL;
 }
@@ -409,7 +413,12 @@ lookup_tmp_var (tree val, bool is_formal
 {
   tree ret;
 
-  if (!is_formal || TREE_SIDE_EFFECTS (val))
+  /* If not optimizing, never really reuse a temporary.  local-alloc
+     won't allocate any variable that is used in more than one basic
+     block, which means it will go into memory, causing much extra
+     work in reload and final and poorer code generation, outweighing
+     the extra memory allocation here.  */
+  if (!optimize || !is_formal || TREE_SIDE_EFFECTS (val))
     ret = create_tmp_from_val (val);
   else
     {
============================================================


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]