This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Register pressure aware loop unrolling


On Fri, 11 Dec 2015, Shiva Chen wrote:

> Hi all,
> 
> Loop unrolling would decrease performance in some case due to unrolling high register pressure
> loops and produce lots of spill code.
> 
> I try to implement register pressure aware loop unrolling(-funroll-loops-pressure-aware)
> to avoid unrolling high register pressure loops.
> 
> The idea is to calculate live register number for each basic block to estimate register pressure.
> If the loop contain the basic block with high register pressure, don't unrolling it.
> 
> Register pressure estimate to high if (live_reg_num > available_hard_reg_num)
> like ira and gcse did.
> 
> However, loop unrolling may increase the register pressure.
> Therefore, when live register number near to available hard register number,
> it's not profitable to unroll it.
> E.g. live_reg_num = 12, available_hard_reg_num = 14
> 
> To estimate the register pressure increment after unrolling, adding 
> parameter PARAM_LOOP_UNROLL_PRESSURE_INCREMENT.
> 
> The equation become 
> 
> high_reg_pressure_p = true if (live_reg_num + 
> PARAM_LOOP_UNROLL_PRESSURE_INCREMENT > available_hard_reg_num)
> 
> Bootstrapped and tested on x86-64.
> 
> Any suggestion ?

I think the general RTL loop unrolling pass is just "bad".  You
unroll to expose scheduling freedom (which in turn may increase
register pressure).  Unrolling itself doesn't increase register
pressure.

So to me the correct thing to do is to use one of the
loop-aware scheduling algorithms (that end up performing
unrolling) like -fmodulo-sched or to make scheduling
register pressure aware which is already supported with -fsched-pressure
(only relevant for pre-RA scheduling of course).

So to me the patch attacks the wrong pass and instead modulo
scheduling should be improved (like no longer depend on
do-loop patterns) and eventually move the thing to GIMPLE
to improve its dependence analysis, making it a scheduling-driven
GIMPLE unrolling pass.

disclaimer: I didn't look at your patch.

Richard.

> Thanks,
> Shiva
> 
> 
> 2015-12-11  Shiva Chen  <shivac@marvell.com>
> 
> 	* cfgloop.h (struct loop): Add high_reg_pressure_p field.
> 	(UAP_UNROLL_PRESSURE_AWARE): New enums.
> 	* reg-pressure.h (struct bb_data): Data stored for each basic block.
> 	* common.opt: Add new flag -funroll-loops-pressure-aware.
> 	* params.def: Add parameter PARAM_LOOP_UNROLL_PRESSURE_INCREMENT.
> 	* loop-init.c(pass_rtl_unroll_loops:gate):
> 	Enable unrolling while -funroll-loops-pressure-aware enable.
> 	(pass_rtl_unroll_loops:execute): Likewise.
> 	* toplev.c (process_options): Likewise.
> 	* loop-unroll.c (decide_unroll_constant_iterations):
> 	Not unroll the loop with high register pressure 
> 	if -funroll-loops-pressure-aware enable.
> 	(decide_unroll_runtime_iterations): Likewise.
> 	(decide_unroll_stupid): Likewise.
> 	* reg-pressure.c (get_regno_pressure_class): Get pressure class.
> 	(change_pressure): Change register pressure while scan basic blocks.
> 	(calculate_bb_reg_pressure): Calculate register pressure
> 	for each basic block.
> 	(check_register_pressure): Determine high_reg_pressure_p for each loop.
> 	(calculate_loop_pressure): Calculate register pressure for each loop.
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index d2d09f6..6c62b63 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1384,6 +1384,7 @@ OBJS = \
>         reginfo.o \
>         regrename.o \
>         regstat.o \
> +       reg-pressure.o \
>         reload.o \
>         reload1.o \
>         reorg.o \
> diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
> index ee73bf9..c4a06e7 100644
> --- a/gcc/cfgloop.h
> +++ b/gcc/cfgloop.h
> @@ -191,6 +191,9 @@ struct GTY ((chain_next ("%h.next"))) loop {
>    /* True if we should try harder to vectorize this loop.  */
>    bool force_vectorize;
> 
> +  /* True if the loop have high register pressure.  */
> +  bool high_reg_pressure_p;
> +
>    /* True if the loop is part of an oacc kernels region.  */
>    bool in_oacc_kernels_region;
> 
> @@ -742,7 +745,8 @@ loop_optimizer_finalize ()
>  enum
>  {
>    UAP_UNROLL = 1,      /* Enables unrolling of loops if it seems profitable.  */
> -  UAP_UNROLL_ALL = 2   /* Enables unrolling of all loops.  */
> +  UAP_UNROLL_ALL = 2,  /* Enables unrolling of all loops.  */
> +  UAP_UNROLL_PRESSURE_AWARE = 4        /* Enables unrolling of loops with pressure aware.  */
>  };
> 
>  extern void doloop_optimize_loops (void);
> diff --git a/gcc/common.opt b/gcc/common.opt
> index e593631..a44c7dc 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2425,6 +2425,10 @@ funroll-all-loops
>  Common Report Var(flag_unroll_all_loops) Optimization
>  Perform loop unrolling for all loops.
> 
> +funroll-loops-pressure-aware
> +Common Report Var(flag_unroll_loops_pressure_aware) Optimization
> +Perform loop unrolling for low register pressure loops
> +
>  ; Nonzero means that loop optimizer may assume that the induction variables
>  ; that control loops do not overflow and that the loops with nontrivial
>  ; exit condition are not infinite
> diff --git a/gcc/loop-init.c b/gcc/loop-init.c
> index e32c94a..232465c 100644
> --- a/gcc/loop-init.c
> +++ b/gcc/loop-init.c
> @@ -560,7 +560,8 @@ public:
>    /* opt_pass methods: */
>    virtual bool gate (function *)
>      {
> -      return (flag_peel_loops || flag_unroll_loops || flag_unroll_all_loops);
> +      return (flag_peel_loops || flag_unroll_loops || flag_unroll_all_loops
> +             || flag_unroll_loops_pressure_aware);
>      }
> 
>    virtual unsigned int execute (function *);
> @@ -580,6 +581,8 @@ pass_rtl_unroll_loops::execute (function *fun)
>         flags |= UAP_UNROLL;
>        if (flag_unroll_all_loops)
>         flags |= UAP_UNROLL_ALL;
> +      if (flag_unroll_loops_pressure_aware)
> +       flags |= UAP_UNROLL_PRESSURE_AWARE;
> 
>        unroll_loops (flags);
>      }
> diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
> index 3e26b21..cffd9140 100644
> --- a/gcc/loop-unroll.c
> +++ b/gcc/loop-unroll.c
> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "dojump.h"
>  #include "expr.h"
>  #include "dumpfile.h"
> +#include "reg-pressure.h"
> 
>  /* This pass performs loop unrolling.  We only perform this
>     optimization on innermost loops (with single exception) because
> @@ -275,6 +276,10 @@ unroll_loops (int flags)
>    struct loop *loop;
>    bool changed = false;
> 
> +  /* Calculate loop pressure.  */
> +  if (flags & UAP_UNROLL_PRESSURE_AWARE)
> +    calculate_loop_pressure ();
> +
>    /* Now decide rest of unrolling.  */
>    decide_unrolling (flags);
> 
> @@ -346,12 +351,20 @@ decide_unroll_constant_iterations (struct loop *loop, int flags)
>    struct niter_desc *desc;
>    widest_int iterations;
> -  if (!(flags & UAP_UNROLL))
> +  if (!(flags & UAP_UNROLL) && !(flags & UAP_UNROLL_PRESSURE_AWARE))
>      {
>        /* We were not asked to, just return back silently.  */
>        return;
>      }
> 
> +  if ((flags & UAP_UNROLL_PRESSURE_AWARE) && loop->high_reg_pressure_p)
> +    {
> +      if (dump_file)
> +       fprintf (dump_file, ";; Not considering unrolling loop with constant\n"
> +                           "   register pressure is too high\n");
> +      return;
> +    }
> +
>    if (dump_file)
>      fprintf (dump_file,
>              "\n;; Considering unrolling loop with constant "
> @@ -640,12 +653,20 @@ decide_unroll_runtime_iterations (struct loop *loop, int flags)
>    struct niter_desc *desc;
>    widest_int iterations;
> 
> -  if (!(flags & UAP_UNROLL))
> +  if (!(flags & UAP_UNROLL) && !(flags & UAP_UNROLL_PRESSURE_AWARE))
>      {
>        /* We were not asked to, just return back silently.  */
>        return;
>      }
> 
> +  if ((flags & UAP_UNROLL_PRESSURE_AWARE) && loop->high_reg_pressure_p)
> +    {
> +      if (dump_file)
> +       fprintf (dump_file, ";; Not considering unrolling loop with runtime\n"
> +                           "   register pressure is too high\n");
> +      return;
> +    }
> +
>    if (dump_file)
>      fprintf (dump_file,
>              "\n;; Considering unrolling loop with runtime "
> @@ -1087,12 +1108,20 @@ decide_unroll_stupid (struct loop *loop, int flags)
>    struct niter_desc *desc;
>    widest_int iterations;
> 
> -  if (!(flags & UAP_UNROLL_ALL))
> +  if (!(flags & UAP_UNROLL_ALL) && !(flags & UAP_UNROLL_PRESSURE_AWARE))
>      {
>        /* We were not asked to, just return back silently.  */
>        return;
>      }
> 
> +  if ((flags & UAP_UNROLL_PRESSURE_AWARE) && loop->high_reg_pressure_p)
> +    {
> +      if (dump_file)
> +       fprintf (dump_file, ";; Not considering unrolling loop stupidly\n"
> +                           "   register pressure is too high\n");
> +      return;
> +    }
> +
>    if (dump_file)
>      fprintf (dump_file, "\n;; Considering unrolling loop stupidly\n");
> 
> diff --git a/gcc/params.def b/gcc/params.def
> index 41fd8a8..1b1f26b 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -332,7 +332,12 @@ DEFPARAM(PARAM_MAX_UNROLL_ITERATIONS,
>          "max-completely-peel-loop-nest-depth",
>          "The maximum depth of a loop nest we completely peel.",
>          8, 0, 0)
> -
> +/* Register pressure increment after unrolling
> +   used to justify register pressure aware loop unrolling.  */
> +DEFPARAM (PARAM_LOOP_UNROLL_PRESSURE_INCREMENT,
> +         "loop-unroll-pressure-increment",
> +         "Register pressure increment after unrolling.",
> +         3, 0, 16)
>  /* The maximum number of insns of an unswitched loop.  */
>  DEFPARAM(PARAM_MAX_UNSWITCH_INSNS,
>         "max-unswitch-insns",
> diff --git a/gcc/reg-pressure.c b/gcc/reg-pressure.c
> new file mode 100644
> index 0000000..590e222
> --- /dev/null
> +++ b/gcc/reg-pressure.c
> @@ -0,0 +1,212 @@
> +/* Calculate register pressure.
> +   Copyright (C) 2004-2015 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "rtl.h"
> +#include "cfgloop.h"
> +#include "params.h"
> +#include "dumpfile.h"
> +#include "df.h"
> +#include "regs.h"
> +#include "ira.h"
> +#include "reg-pressure.h"
> +
> +#define BB_DATA(bb) ((struct bb_data *) (bb)->aux)
> +
> +static basic_block curr_bb;
> +
> +/* Current register pressure for each pressure class.  */
> +static int curr_reg_pressure[N_REG_CLASSES];
> +
> +static enum reg_class get_regno_pressure_class (int regno, int *nregs);
> +
> +/* Return pressure class and number of needed hard registers (through
> +   *NREGS) of register REGNO.  */
> +static enum reg_class
> +get_regno_pressure_class (int regno, int *nregs)
> +{
> +  if (regno >= FIRST_PSEUDO_REGISTER)
> +    {
> +      enum reg_class pressure_class;
> +
> +      pressure_class = reg_allocno_class (regno);
> +      pressure_class = ira_pressure_class_translate[pressure_class];
> +      *nregs
> +       = ira_reg_class_max_nregs[pressure_class][PSEUDO_REGNO_MODE (regno)];
> +      return pressure_class;
> +    }
> +  else if (! TEST_HARD_REG_BIT (ira_no_alloc_regs, regno)
> +          && ! TEST_HARD_REG_BIT (eliminable_regset, regno))
> +    {
> +      *nregs = 1;
> +      return ira_pressure_class_translate[REGNO_REG_CLASS (regno)];
> +    }
> +  else
> +    {
> +      *nregs = 0;
> +      return NO_REGS;
> +    }
> +}
> +
> +/* Increase (if INCR_P) or decrease current register pressure for
> +   register REGNO.  */
> +static void
> +change_pressure (int regno, bool incr_p)
> +{
> +  int nregs;
> +  enum reg_class pressure_class;
> +
> +  pressure_class = get_regno_pressure_class (regno, &nregs);
> +  if (! incr_p)
> +    curr_reg_pressure[pressure_class] -= nregs;
> +  else
> +    {
> +      curr_reg_pressure[pressure_class] += nregs;
> +      if (BB_DATA (curr_bb)->max_reg_pressure[pressure_class]
> +         < curr_reg_pressure[pressure_class])
> +       BB_DATA (curr_bb)->max_reg_pressure[pressure_class]
> +         = curr_reg_pressure[pressure_class];
> +    }
> +}
> +
> +/* Calculate register pressure for each basic block by walking insns
> +   from last to first.  */
> +static void
> +calculate_bb_reg_pressure (void)
> +{
> +  int i;
> +  unsigned int j;
> +  rtx_insn *insn;
> +  basic_block bb;
> +  bitmap curr_regs_live;
> +  bitmap_iterator bi;
> +
> +
> +  ira_setup_eliminable_regset ();
> +  curr_regs_live = BITMAP_ALLOC (&reg_obstack);
> +  FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      curr_bb = bb;
> +      BB_DATA (bb)->live_in = BITMAP_ALLOC (NULL);
> +      BB_DATA (bb)->backup = BITMAP_ALLOC (NULL);
> +      bitmap_copy (BB_DATA (bb)->live_in, df_get_live_in (bb));
> +      bitmap_copy (curr_regs_live, df_get_live_out (bb));
> +      for (i = 0; i < ira_pressure_classes_num; i++)
> +       curr_reg_pressure[ira_pressure_classes[i]] = 0;
> +      EXECUTE_IF_SET_IN_BITMAP (curr_regs_live, 0, j, bi)
> +       change_pressure (j, true);
> +
> +      FOR_BB_INSNS_REVERSE (bb, insn)
> +       {
> +         rtx dreg;
> +         int regno;
> +         df_ref def, use;
> +
> +         if (! NONDEBUG_INSN_P (insn))
> +           continue;
> +
> +         FOR_EACH_INSN_DEF (def, insn)
> +           {
> +             dreg = DF_REF_REAL_REG (def);
> +             gcc_assert (REG_P (dreg));
> +             regno = REGNO (dreg);
> +             if (!(DF_REF_FLAGS (def)
> +                   & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
> +               {
> +                 if (bitmap_clear_bit (curr_regs_live, regno))
> +                   change_pressure (regno, false);
> +               }
> +           }
> +
> +         FOR_EACH_INSN_USE (use, insn)
> +           {
> +             dreg = DF_REF_REAL_REG (use);
> +             gcc_assert (REG_P (dreg));
> +             regno = REGNO (dreg);
> +             if (bitmap_set_bit (curr_regs_live, regno))
> +               change_pressure (regno, true);
> +           }
> +       }
> +    }
> +  BITMAP_FREE (curr_regs_live);
> +}
> +
> +static bool
> +check_register_pressure_1 (struct loop *loop)
> +{
> +  unsigned i;
> +  int j, reg_pressure_inc = 0;
> +  basic_block *body, bb;
> +  enum reg_class pressure_class;
> +
> +  /* get basic blocks of the loop.  */
> +  body = get_loop_body (loop);
> +
> +  if (dump_file)
> +    fprintf (dump_file, "Loop %d nodes = %d\n", loop->num, loop->num_nodes);
> +
> +  /* traverse each basic block of the loop.  */
> +  for (i = 0; i < loop->num_nodes; i++)
> +    {
> +      bb = body[i];
> +
> +      if (BB_DATA (bb) == NULL)
> +       continue;
> +
> +      /* Set reg_pressure_inc if the loop is not small loop.  */
> +      if (loop->num_nodes > 2)
> +       reg_pressure_inc = PARAM_VALUE (PARAM_LOOP_UNROLL_PRESSURE_INCREMENT);
> +
> +      if (dump_file)
> +       fprintf (dump_file, "  Basic block %d\n", bb->index);
> +
> +      for (j = 0; j < ira_pressure_classes_num; j++)
> +        {
> +         pressure_class = ira_pressure_classes[j];
> +
> +         if (BB_DATA (bb)->max_reg_pressure[pressure_class] == 0)
> +           continue;
> +
> +         if (dump_file)
> +           {
> +             fprintf (dump_file, "    Live %s=%d\n",
> +                      reg_class_names[pressure_class],
> +                      BB_DATA (bb)->max_reg_pressure[pressure_class]);
> +
> +             fprintf (dump_file, "    Total available %s=%d\n\n",
> +                      reg_class_names[pressure_class],
> +                      ira_class_hard_regs_num[pressure_class]);
> +           }
> +
> +         if (BB_DATA (bb)->max_reg_pressure[pressure_class]
> +             +  reg_pressure_inc
> +             > ira_class_hard_regs_num[pressure_class])
> +           return true;
> +       }
> +    }
> +
> +  return false;
> +}
> +
> +static void
> +check_register_pressure (void)
> +{
> +  struct loop *loop;
> +
> +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> +    {
> +      loop->high_reg_pressure_p = check_register_pressure_1 (loop);
> +    }
> +}
> +
> +void
> +calculate_loop_pressure (void)
> +{
> +  regstat_init_n_sets_and_refs ();
> +  ira_set_pseudo_classes (false, dump_file);
> +  alloc_aux_for_blocks (sizeof (struct bb_data));
> +  calculate_bb_reg_pressure ();
> +  check_register_pressure ();
> +  free_aux_for_blocks ();
> +  regstat_free_n_sets_and_refs ();
> +}
> diff --git a/gcc/reg-pressure.h b/gcc/reg-pressure.h
> new file mode 100644
> index 0000000..e078643
> --- /dev/null
> +++ b/gcc/reg-pressure.h
> @@ -0,0 +1,42 @@
> +/* Exported functions from reg-pressure.c
> +   Copyright (C) 2004-2015 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_REG_PRESSURE_H
> +#define GCC_REG_PRESSURE_H
> +
> +/* Data stored for each basic block.  */
> +struct bb_data
> +{
> +  /* Maximal register pressure inside basic block for given register class
> +     (defined only for the pressure classes).  */
> +  int max_reg_pressure[N_REG_CLASSES];
> +  /* Recorded register pressure of basic block before trying to hoist
> +     an expression.  Will be used to restore the register pressure
> +     if the expression should not be hoisted.  */
> +  int old_pressure;
> +  /* Recorded register live_in info of basic block during code hoisting
> +     process.  BACKUP is used to record live_in info before trying to
> +     hoist an expression, and will be used to restore LIVE_IN if the
> +     expression should not be hoisted.  */
> +  bitmap live_in, backup;
> +};
> +
> +extern void calculate_loop_pressure (void);
> +
> +#endif /* GCC_REG_PRESSURE_H */
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index aee55fc..ad4d55c 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -1276,7 +1276,7 @@ process_options (void)
> 
>    /* Unrolling all loops implies that standard loop unrolling must also
>       be done.  */
> -  if (flag_unroll_all_loops)
> +  if (flag_unroll_all_loops || flag_unroll_loops_pressure_aware)
>      flag_unroll_loops = 1;
> 
>    /* web and rename-registers help when run after loop unrolling.  */
> --
> 2.5.0
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]