This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug rtl-optimization/38518] Excessive compile time with -O3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38518

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |stevenb.gcc at gmail dot com

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue here is the DF machinery, or rather the fact that RTL invariant
motion
calls df_analyze () for each loop, thus

void
move_loop_invariants (void)
{
  struct loop *loop;

  if (flag_ira_loop_pressure)
    {
      df_analyze ();
      regstat_init_n_sets_and_refs ();
      ira_set_pseudo_classes (true, dump_file);
      calculate_loop_reg_pressure ();
      regstat_free_n_sets_and_refs ();
    }
  df_set_flags (DF_EQ_NOTES + DF_DEFER_INSN_RESCAN);
  /* Process the loops, innermost first.  */
  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
    {
      curr_loop = loop;
      /* move_single_loop_invariants for very large loops
         is time consuming and might need a lot of memory.  */
      if (loop->num_nodes <= (unsigned) LOOP_INVARIANT_MAX_BBS_IN_LOOP)
        move_single_loop_invariants (loop);
    }

here move_single_loop_invariants -> find_invariants -> find_defs which
does

static void
find_defs (struct loop *loop, basic_block *body)
{
  unsigned i;
  bitmap blocks = BITMAP_ALLOC (NULL);

  for (i = 0; i < loop->num_nodes; i++)
    bitmap_set_bit (blocks, body[i]->index);

  if (dump_file)
    {
      fprintf (dump_file,
               "*****starting processing of loop %d ******\n",
               loop->num);
    }

  df_remove_problem (df_chain);
  df_process_deferred_rescans ();
  df_chain_add_problem (DF_UD_CHAIN);
  df_set_blocks (blocks);
  df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
  df_analyze ();
...

which is excessive.  It looks like the above DF stuff does not affects the
whole
function but only the BBs in the loop.  Still the fact that we iterate
from inner to outer loops still
makes the DF analysis time quadratic in the loop depth.

The testcase has only loops of depth 1 (but 2164 ones), so it seems that
the DF restriction to a set of BBs does not work as expected?

>From the profile I see that the most time spent in df_analyze is
inverted_post_order_compute and post_order_compute (obviosly not
region aware!) and then bitmap_set_bit and df_prune_to_subcfg.

I wonder if it isn't possible to either update the DF during the
transform, deal with partly out-of-date DF info (like process
loop siblings without re-calculating DF, and re-compute whole FN
DF after such iteration).


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]