This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Split GIMPLE temporaries at -O0
- From: gkeating at apple dot com (Geoffrey Keating)
- To: gcc-patches at gcc dot gnu dot org
- Cc: jason at redhat dot com, dnovillo at redhat dot com
- Date: Wed, 25 Aug 2004 11:09:36 -0700 (PDT)
- Subject: Split GIMPLE temporaries at -O0
Consider a small routine, this example is from vector<> in libstdc++:
void
push_back(const value_type& __x)
{
if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
{
this->_M_impl.construct(this->_M_impl._M_finish, __x);
++this->_M_impl._M_finish;
}
else
_M_insert_aux(end(), __x);
}
The expression 'this->_M_impl._M_finish' is repeated three times. The
gimplifier would try to re-use the same temporary for each repetition.
This works great at -O1, saving memory and possibly improving
optimization opportunities. At -O0, however, you'll notice that the
new temporary will appear in two different basic blocks, like this:
T.5925 = this-><D153590>._M_impl._M_finish;
T.5926 = this-><D153590>._M_impl._M_end_of_storage;
if (T.5925 != T.5926)
{
this.5927 = (struct new_allocator<TPropertyList> *)this;
T.5925 = this-><D153590>._M_impl._M_finish;
construct (this.5927, T.5925, __x);
At -O0, we're very limited in register allocation. There are
basically three possibilities:
- A user variable is placed in memory by initial RTL generation.
- An artificial DECL which is used in only one basic block is
allocated to a register by local-alloc.
- An artificial DECL used in more than one basic block is allocated to
a memory location by reload, and then anything that used it is
reloaded if it can't take a memory address.
On a RISC machine like powerpc, this means that a temporary like
T.5925 will be allocated to memory and create one reload for every
instruction that it's present in. Not good for code quality, and not
good for compile speed; this resulted in a 4x increase in the number
of non-scratch reloads in a sample file when built with gcc 3.5
compared to gcc 3.3.
So, let's stop doing that. This patch caused a speed improvement of
about 1% (not what I'd hoped for, because of the extra memory
allocation, but still an improvement) on a Finder_FE build at -O0 -g
on ppc-darwin, and a code size improvement of about 2% on one file
from that build.
I expect that eventually we'll need a GIMPLE_DECL which contains only
the handful of fields that GIMPLE actually needs in a DECL (name,
type, a bunch of bitfields), and all the other kinds of DECL will
become frontend-only constructs.
Bootstrapped & tested on powerpc-darwin.
--
- Geoffrey Keating <geoffk@apple.com>
===File ~/patches/gcc-speed-split.patch=====================
2004-08-24 Geoffrey Keating <geoffk@apple.com>
* gimplify.c (lookup_tmp_var): Separate temporaries when not
optimizing.
Index: gimplify.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/gimplify.c,v
retrieving revision 2.66
diff -u -p -u -p -r2.66 gimplify.c
--- gimplify.c 24 Aug 2004 15:46:36 -0000 2.66
+++ gimplify.c 25 Aug 2004 17:50:38 -0000
@@ -117,8 +117,11 @@ push_gimplify_context (void)
abort ();
gimplify_ctxp
= (struct gimplify_ctx *) xcalloc (1, sizeof (struct gimplify_ctx));
- gimplify_ctxp->temp_htab
- = htab_create (1000, gimple_tree_hash, gimple_tree_eq, free);
+ if (optimize)
+ gimplify_ctxp->temp_htab
+ = htab_create (1000, gimple_tree_hash, gimple_tree_eq, free);
+ else
+ gimplify_ctxp->temp_htab = NULL;
}
/* Tear down a context for the gimplifier. If BODY is non-null, then
@@ -142,12 +145,13 @@ pop_gimplify_context (tree body)
record_vars (gimplify_ctxp->temps);
#if 0
- if (!quiet_flag)
+ if (!quiet_flag && optimize)
fprintf (stderr, " collisions: %f ",
htab_collisions (gimplify_ctxp->temp_htab));
#endif
- htab_delete (gimplify_ctxp->temp_htab);
+ if (optimize)
+ htab_delete (gimplify_ctxp->temp_htab);
free (gimplify_ctxp);
gimplify_ctxp = NULL;
}
@@ -409,7 +413,12 @@ lookup_tmp_var (tree val, bool is_formal
{
tree ret;
- if (!is_formal || TREE_SIDE_EFFECTS (val))
+ /* If not optimizing, never really reuse a temporary. local-alloc
+ won't allocate any variable that is used in more than one basic
+ block, which means it will go into memory, causing much extra
+ work in reload and final and poorer code generation, outweighing
+ the extra memory allocation here. */
+ if (!optimize || !is_formal || TREE_SIDE_EFFECTS (val))
ret = create_tmp_from_val (val);
else
{
============================================================