This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [patch] Use a smaller, dynamic worklist in compute_global_livein
On Tue, Aug 7, 2012 at 10:31 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Aug 7, 2012 at 8:24 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
>> Hello,
>>
>> In the test case for PR54146, compute_global_livein allocates/frees a
>> worklist for >400,000 basic blocks on each invocation. And it's called
>> a lot, for rewrite_into_loop_closed_ssa. But the maximum number of
>> basic blocks ever on the work list was only ~6500. So the work list
>> can be much smaller most of the time.
>>
>> Bootstrapped&tested on x86_64-unknown-linux-gnu. OK for trunk?
>
> Index: tree-ssa-loop-manip.c
> ===================================================================
> --- tree-ssa-loop-manip.c (revision 190176)
> +++ tree-ssa-loop-manip.c (working copy)
> @@ -162,10 +162,8 @@ add_exit_phis_var (tree var, bitmap live
> basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (var));
> bitmap_iterator bi;
>
> - if (is_gimple_reg (var))
> - bitmap_clear_bit (livein, def_bb->index);
> - else
> - bitmap_set_bit (livein, def_bb->index);
> + gcc_checking_assert (is_gimple_reg (var));
> + bitmap_clear_bit (livein, def_bb->index);
>
> !is_gimple_reg is true for virtual operand PHIs ... and at least
> find_uses_to_rename_stmt loops over all uses (so, I suppose given the
> comments elsewhere about tracking vops can you fix that to
> s/SSA_OP_ALL_USES/SSA_OP_USE/?
I'll give that a try.
> void
> -compute_global_livein (bitmap livein ATTRIBUTE_UNUSED, bitmap
> def_blocks ATTRIBUTE_UNUSED)
> +compute_global_livein (bitmap livein, bitmap def_blocks)
> {
> - basic_block bb, *worklist, *tos;
> unsigned i;
> bitmap_iterator bi;
> + VEC (basic_block, heap) *worklist;
>
> - tos = worklist
> - = (basic_block *) xmalloc (sizeof (basic_block) * (last_basic_block + 1));
> + /* Normally the work list size is bounded by the number of basic
> + blocks in the largest loop. We don't know this number, but we
> + can be fairly sure that it will be relatively small. */
> + worklist = VEC_alloc (basic_block, heap, MAX (8, n_basic_blocks / 100));
>
> I suppose if we really want to optimize this we can make worklist static
> so we at most grow once. (in case you want to optimize allocation overhead,
> not memory used). Other structures in the ssa rewriter have this kind of
> lifetime, too (the per-SSA name info for example).
I don't want this to be static. This shouldn't be a persistent piece of data.
But I actually did try a static work list (the full list, not the
VEC), and it made no difference at all on the timings.
All time is spent in the loop in compute_global_livein itself. There
are >400,000 basic blocks but the maximum loop depth is only 3. I've
tried out livein as a sparseset last night (allocated and filled in
add_exit_phis_var and re-used for every call to add_exit_phis_edge),
that reduces the time spent in compute_global_livein by a factor 2.5
for this test case (i.e. not the order-of-magnitude change I'm looking
for).
Why can't the normal SSA updater be used to rewrite into loop-closed SSA form?
Ciao!
Steven