This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][RFC] Add the ability to record sub-timevars (-ftime-report-details)


On Wed, 2016-08-10 at 14:04 +0200, Richard Biener wrote:
> The following patch adds the ability to record time spent in utility
> to the pass using it.  For things like CFG cleanup, alias stmt
> walking
> or DF infrastructure work it is currently not visible which pass is
> responsible for the time spent there.  With -ftime-report-details
> you now get
> 
>  alias stmt walking      :   4.48 ( 2%) usr   0.14 ( 3%) sys   4.66 (
> 2%) 
> wall    1209 kB ( 0%) ggc
> ...
>  tree PRE                :   3.58 ( 2%) usr   0.04 ( 1%) sys   3.71 (
> 2%) 
> wall   11870 kB ( 1%) ggc
>  `- tree tail merge      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 (
> 0%) 
> wall       0 kB ( 0%) ggc
>  `- alias stmt walking   :   0.88 ( 0%) usr   0.00 ( 0%) sys   0.91 (
> 0%) 
> wall     153 kB ( 0%) ggc
>  `- loop init            :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.08 (
> 0%) 
> wall       0 kB ( 0%) ggc
>  `- tree SSA incremental :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 (
> 0%) 
> wall       0 kB ( 0%) ggc
>  `- tree operand scan    :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.04 (
> 0%) 
> wall     544 kB ( 0%) ggc
>  `- tree STMT verifier   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 (
> 0%) 
> wall       0 kB ( 0%) ggc
>  tree FRE                :   3.91 ( 2%) usr   0.09 ( 2%) sys   3.71 (
> 2%) 
> wall    5658 kB ( 1%) ggc
>  `- alias stmt walking   :   2.81 ( 2%) usr   0.08 ( 2%) sys   2.84 (
> 2%) 
> wall     819 kB ( 0%) ggc
>  `- unaccounted todo     :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 (
> 0%) 
> wall     851 kB ( 0%) ggc
>  `- tree operand scan    :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.02 (
> 0%) 
> wall       0 kB ( 0%) ggc
>  `- dominance computation:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.01 (
> 0%) 
> wall       0 kB ( 0%) ggc
>  tree code sinking       :   0.11 ( 0%) usr   0.02 ( 0%) sys   0.17 (
> 0%) 
> wall    1268 kB ( 0%) ggc
>  `- loop init            :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.11 (
> 0%) 
> wall     526 kB ( 0%) ggc
>  `- dominance computation:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.01 (
> 0%) 
> wall       0 kB ( 0%) ggc
> ...

The example seems to have suffered from line-wrap.  Presumably it looks
like:

 alias stmt walking      :   4.48 ( 2%) usr   0.14 ( 3%) sys   4.66 ( 2%) wall    1209 kB ( 0%) ggc
...
 tree PRE                :   3.58 ( 2%) usr   0.04 ( 1%) sys   3.71 ( 2%) wall   11870 kB ( 1%) ggc
 `- tree tail merge      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 `- alias stmt walking   :   0.88 ( 0%) usr   0.00 ( 0%) sys   0.91 ( 0%) wall     153 kB ( 0%) ggc
 `- loop init            :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall       0 kB ( 0%) ggc
 `- tree SSA incremental :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 `- tree operand scan    :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall     544 kB ( 0%) ggc
 `- tree STMT verifier   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree FRE                :   3.91 ( 2%) usr   0.09 ( 2%) sys   3.71 ( 2%) wall    5658 kB ( 1%) ggc
 `- alias stmt walking   :   2.81 ( 2%) usr   0.08 ( 2%) sys   2.84 ( 2%) wall     819 kB ( 0%) ggc
 `- unaccounted todo     :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall     851 kB ( 0%) ggc
 `- tree operand scan    :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 `- dominance computation:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree code sinking       :   0.11 ( 0%) usr   0.02 ( 0%) sys   0.17 ( 0%) wall    1268 kB ( 0%) ggc
 `- loop init            :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall     526 kB ( 0%) ggc
 `- dominance computation:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
...


(...assuming I can get Evolution to suppress line-wrapping above)


BTW, did you test this with jit enabled?  The jit "frontend" exposes
part of the timer API to client code and uses it in some ways that the
rest of the code doesn't, with its own two-level hierarchy:
https://gcc.gnu.org/onlinedocs/jit/topics/performance.html#the-timing-api

Various other comments inline below

> note that the full time spent in, say, alias stmt walking is still
> recorded and dumped.  Note this also visualizes current nesting
> of timevars and thus you'll get
> 
>  rest of compilation     :   1.02 ( 1%) usr   0.04 ( 1%) sys   1.17 (
> 1%) 
> wall    4977 kB ( 0%) ggc
>  `- hard reg cprop       :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 (
> 0%) 
> wall       0 kB ( 0%) ggc
>  `- tree conservative DCE:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 (
> 0%) 
> wall       0 kB ( 0%) ggc
> ...
> 
> because we have passes w/o timevar that are accumulated in
> rest of compilation.  There is the opportunity to clean this up.
> 
> I've extended the pass timevars to also cover their TODOs and thus
> removed TV_TODO.
> 
> The accounting is "flat", that is, at most one level of sub-timevars
> are recorded.
> 
> Does this look sensible and useful given the imperfectness of the
> current timevar assignments?
> 
> Bootstrap pending.
> 
> Thanks,
> Richard.
> 
> 2016-08-10  Richard Biener  <rguenther@suse.de>
> 
>         * passes.c (execute_todo): Do not push/pop TV_TODO.
>         (execute_one_ipa_transform_pass): Move timevar push/pop TODO
> execution.
>         (execute_one_pass): Likewise.
>         * common.opt (ftime-report-details): New switch.
>         * doc/invoke.texi (ftime-report-details): Document.
>         * timevar.h (timer::print_row): Adjust signature.
>         (timer::child_map_t): New typedef.
>         (timer::time_var_def): Add children field.
>         * timevar.c (timer::named_items::print): Adjust.
>         (timer::~timer): Free timevar recorded children.
>         (timer::pop_internal): When -ftime-report-details record
>         time spent in sub-timevars.
>         (timer::print_row): Adjust.
>         (timer::print): Print sub-timevar stats.
>         * timevar.def (TV_TODO): Remove.
> 
> Index: gcc/common.opt
> ===================================================================
> --- gcc/common.opt      (revision 238469)
> +++ gcc/common.opt      (working copy)
> @@ -2255,6 +2255,10 @@ ftime-report
>  Common Report Var(time_report)
>  Report the time taken by each compiler pass.
>  
> +ftime-report-details
> +Common Report Var(time_report_details)
> +Record times taken by sub-phases separately.
> +
>  ftls-model=
>  Common Joined RejectNegative Enum(tls_model) Var(flag_tls_default)
> Init(TLS_MODEL_GLOBAL_DYNAMIC)
>  -ftls-model=[global-dynamic|local-dynamic|initial-exec|local-exec]  
>    Set the default thread-local storage code generation model.
> Index: gcc/doc/invoke.texi
> ===================================================================
> --- gcc/doc/invoke.texi (revision 238469)
> +++ gcc/doc/invoke.texi (working copy)
> @@ -548,7 +548,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fprofile-report @gol
>  -frandom-seed=@var{string} -fsched-verbose=@var{n} @gol
>  -fsel-sched-verbose -fsel-sched-dump-cfg -fsel-sched-pipelining
> -verbose @gol
> --fstats  -fstack-usage  -ftime-report @gol
> +-fstats  -fstack-usage  -ftime-report -ftime-report-details @gol
>  -fvar-tracking-assignments-toggle -gtoggle @gol
>  -print-file-name=@var{library}  -print-libgcc-file-name @gol
>  -print-multi-directory  -print-multi-lib  -print-multi-os-directory
> @gol
> @@ -12543,6 +12543,10 @@ print some statistics about each pass wh
>  Makes the compiler print some statistics about the time consumed by
> each
>  pass when it finishes.
>  
> +@item -ftime-report
> +@opindex ftime-report
> +Record the time consumed by infrastructure parts separately for each
> pass.
> +
>  @item -fira-verbose=@var{n}
>  @opindex fira-verbose
>  Control the verbosity of the dump file for the integrated register
> allocator.
> Index: gcc/timevar.c
> ===================================================================
> --- gcc/timevar.c       (revision 238469)
> +++ gcc/timevar.c       (working copy)
> @@ -205,7 +205,7 @@ timer::named_items::print (FILE *fp, con
>      {
>        timer::timevar_def *def = m_hash_map.get (item_name);
>        gcc_assert (def);
> -      m_timer->print_row (fp, total, def);
> +      m_timer->print_row (fp, total, def->name, def->elapsed);
>      }
>  }
>  
> @@ -296,6 +296,9 @@ timer::~timer ()
>        next = iter->next;
>        free (iter);
>      }
> +  for (unsigned i = 0; i < TIMEVAR_LAST; ++i)
> +    if (m_timevars[i].children)
> +      delete m_timevars[i].children;

Isn't delete NULL a no-op, so is this check redundant?

>    delete m_jit_client_items;
>  }
> @@ -399,13 +402,26 @@ timer::pop_internal ()
>    /* Attribute the elapsed time to the element we're popping.  */
>    timevar_accumulate (&popped->timevar->elapsed, &m_start_time,
> &now);
>  
> +  /* Take the item off the stack.  */
> +  m_stack = m_stack->next;
> +
> +  /* Record the elapsed sub-time to the parent as well.  */
> +  if (m_stack && time_report_details)
> +    {
> +      if (! m_stack->timevar->children)
> +       m_stack->timevar->children = new child_map_t (5);
> +      bool existed_p;
> +      timevar_time_def &time
> +       = m_stack->timevar->children->get_or_insert (popped->timevar,
> &existed_p);
> +      if (! existed_p)
> +       memset (&time, 0, sizeof (timevar_time_def));
> +      timevar_accumulate (&time, &m_start_time, &now);
> +    }

Looks like inside a pop there's a lazy allocation of a map, and a
get_or_insert.  Could this make -ftime-report have an impact on the
timing?

>    /* Reset the start time; from now on, time is attributed to the
>       element just exposed on the stack.  */
>    m_start_time = now;
>  
> -  /* Take the item off the stack.  */
> -  m_stack = m_stack->next;
> -
>    /* Don't delete the stack element; instead, add it to the list of
>       unused elements for later use.  */
>    popped->next = m_unused_stack_instances;
> @@ -619,38 +635,38 @@ timer::validate_phases (FILE *fp) const
>  void
>  timer::print_row (FILE *fp,
>                   const timevar_time_def *total,
> -                 const timevar_def *tv)
> +                 const char *name, const timevar_time_def &elapsed)
>  {
>    /* The timing variable name.  */
> -  fprintf (fp, " %-24s:", tv->name);
> +  fprintf (fp, " %-24s:", name);
>  
>  #ifdef HAVE_USER_TIME
>    /* Print user-mode time for this process.  */
>    fprintf (fp, "%7.2f (%2.0f%%) usr",
> -          tv->elapsed.user,
> -          (total->user == 0 ? 0 : tv->elapsed.user / total->user) *
> 100);
> +          elapsed.user,
> +          (total->user == 0 ? 0 : elapsed.user / total->user) *
> 100);
>  #endif /* HAVE_USER_TIME */
>  
>  #ifdef HAVE_SYS_TIME
>    /* Print system-mode time for this process.  */
>    fprintf (fp, "%7.2f (%2.0f%%) sys",
> -          tv->elapsed.sys,
> -          (total->sys == 0 ? 0 : tv->elapsed.sys / total->sys) *
> 100);
> +          elapsed.sys,
> +          (total->sys == 0 ? 0 : elapsed.sys / total->sys) * 100);
>  #endif /* HAVE_SYS_TIME */
>  
>  #ifdef HAVE_WALL_TIME
>    /* Print wall clock time elapsed.  */
>    fprintf (fp, "%7.2f (%2.0f%%) wall",
> -          tv->elapsed.wall,
> -          (total->wall == 0 ? 0 : tv->elapsed.wall / total->wall) *
> 100);
> +          elapsed.wall,
> +          (total->wall == 0 ? 0 : elapsed.wall / total->wall) *
> 100);
>  #endif /* HAVE_WALL_TIME */
>  
>    /* Print the amount of ggc memory allocated.  */
>    fprintf (fp, "%8u kB (%2.0f%%) ggc",
> -          (unsigned) (tv->elapsed.ggc_mem >> 10),
> +          (unsigned) (elapsed.ggc_mem >> 10),
>            (total->ggc_mem == 0
>             ? 0
> -           : (float) tv->elapsed.ggc_mem / total->ggc_mem) * 100);
> +           : (float) elapsed.ggc_mem / total->ggc_mem) * 100);
>  
>    putc ('\n', fp);
>  }
> @@ -702,15 +718,47 @@ timer::print (FILE *fp)
>        if (!tv->used)
>         continue;
>  
> +      bool any_childs_with_time = false;

s/childs/children/

> +      if (tv->children)
> +       for (child_map_t::iterator i = tv->children->begin ();
> +            i != tv->children->end (); ++i)
> +         if (!((*i).second.user < tiny
> +               && (*i).second.sys < tiny
> +               && (*i).second.wall < tiny
> +               && (*i).second.ggc_mem < GGC_MEM_BOUND))
> +           {
> +             any_childs_with_time = true;
> +             break;
> +           }
> +
>        /* Don't print timing variables if we're going to get a row of
> -         zeroes.  */
> -      if (tv->elapsed.user < tiny
> +         zeroes.  Unless there are children with non-zero time.  */
> +      if (! any_childs_with_time
> +         && tv->elapsed.user < tiny
>           && tv->elapsed.sys < tiny
>           && tv->elapsed.wall < tiny
>           && tv->elapsed.ggc_mem < GGC_MEM_BOUND)
>         continue;

Presumably that repeated conditional for a row of zeroes could be moved
to a helper function (or indeed, a method of the timevar?).

> -      print_row (fp, total, tv);
> +      print_row (fp, total, tv->name, tv->elapsed);
> +
> +      if (tv->children)
> +       for (child_map_t::iterator i = tv->children->begin ();
> +            i != tv->children->end (); ++i)
> +         {
> +           timevar_def *tv2 = (*i).first;
> +           /* Don't print timing variables if we're going to get a
> row of
> +              zeroes.  */
> +           if (!((*i).second.user < tiny
> +                 && (*i).second.sys < tiny
> +                 && (*i).second.wall < tiny
> +                 && (*i).second.ggc_mem < GGC_MEM_BOUND))

Same here.

> +             {
> +               char lname[256];
> +               sprintf (lname, "`- %s", tv2->name);

The jit has user-supplied names (via gcc_jit_timer_push) and I don't
think it imposes a length limit, so presumably snprintf would be better
here.

Alternatively, how about adding a "prefix" param to print_row, or
simply printing the prefix to fp before calling print_row?

> +               print_row (fp, total, lname, (*i).second);
> +             }
> +         }
>      }
>    if (m_jit_client_items)
>      m_jit_client_items->print (fp, total);
> Index: gcc/timevar.h
> ===================================================================
> --- gcc/timevar.h       (revision 238469)
> +++ gcc/timevar.h       (working copy)
> @@ -131,9 +131,10 @@ class timer
>    void pop_internal ();
>    static void print_row (FILE *fp,
>                          const timevar_time_def *total,
> -                        const timevar_def *tv);
> +                        const char *name, const timevar_time_def
> &elapsed);
>  
>   private:
> +  typedef hash_map<timevar_def *, timevar_time_def> child_map_t;
>  
>    /* Private type: a timing variable.  */
>    struct timevar_def
> @@ -155,6 +156,8 @@ class timer
>      /* Nonzero if this timing variable was ever started or pushed
> onto
>         the timing stack.  */
>      unsigned used : 1;
> +
> +    child_map_t *children;
>    };
>  
>    /* Private type: an element on the timing stack
> Index: gcc/passes.c
> ===================================================================
> *** gcc/passes.c        (revision 238469)
> --- gcc/passes.c        (working copy)
> *************** execute_todo (unsigned int flags)
> *** 2002,2009 ****
>         && need_ssa_update_p (cfun))
>       gcc_assert (flags & TODO_update_ssa_any);
>   
> -   timevar_push (TV_TODO);
> - 
>     statistics_fini_pass ();
>   
>     if (flags)
> --- 2002,2007 ----
> *************** execute_todo (unsigned int flags)
> *** 2037,2044 ****
>        df problems.  */
>     if (flags & TODO_df_finish)
>       df_finish_pass ((flags & TODO_df_verify) != 0);
> - 
> -   timevar_pop (TV_TODO);
>   }
>   
>   /* Verify invariants that should hold between passes.  This is a
> place
> --- 2035,2040 ----
> *************** execute_one_ipa_transform_pass (struct c
> *** 2190,2209 ****
>   
>     pass_init_dump_file (pass);
>   
> -   /* Run pre-pass verification.  */
> -   execute_todo (ipa_pass->function_transform_todo_flags_start);
> - 
>     /* If a timevar is present, start it.  */
>     if (pass->tv_id != TV_NONE)
>       timevar_push (pass->tv_id);
>   
>     /* Do it!  */
>     todo_after = ipa_pass->function_transform (node);
>   
> -   /* Stop timevar.  */
> -   if (pass->tv_id != TV_NONE)
> -     timevar_pop (pass->tv_id);
> - 
>     if (profile_report && cfun && (cfun->curr_properties & PROP_cfg))
>       check_profile_consistency (pass->static_pass_number, 0, true);
>   
> --- 2186,2201 ----
>   
>     pass_init_dump_file (pass);
>   
>     /* If a timevar is present, start it.  */
>     if (pass->tv_id != TV_NONE)
>       timevar_push (pass->tv_id);
>   
> +   /* Run pre-pass verification.  */
> +   execute_todo (ipa_pass->function_transform_todo_flags_start);
> + 
>     /* Do it!  */
>     todo_after = ipa_pass->function_transform (node);
>   
>     if (profile_report && cfun && (cfun->curr_properties & PROP_cfg))
>       check_profile_consistency (pass->static_pass_number, 0, true);
>   
> *************** execute_one_ipa_transform_pass (struct c
> *** 2213,2218 ****
> --- 2205,2214 ----
>     if (profile_report && cfun && (cfun->curr_properties & PROP_cfg))
>       check_profile_consistency (pass->static_pass_number, 1, true);
>   
> +   /* Stop timevar.  */
> +   if (pass->tv_id != TV_NONE)
> +     timevar_pop (pass->tv_id);
> + 
>     if (dump_file)
>       do_per_function (execute_function_dump, pass);
>     pass_fini_dump_file (pass);
> *************** execute_one_pass (opt_pass *pass)
> *** 2321,2326 ****
> --- 2317,2326 ----
>   
>     pass_init_dump_file (pass);
>   
> +   /* If a timevar is present, start it.  */
> +   if (pass->tv_id != TV_NONE)
> +     timevar_push (pass->tv_id);
> + 
>     /* Run pre-pass verification.  */
>     execute_todo (pass->todo_flags_start);
>   
> *************** execute_one_pass (opt_pass *pass)
> *** 2328,2342 ****
>       do_per_function (verify_curr_properties,
>                      (void *)(size_t)pass->properties_required);
>   
> -   /* If a timevar is present, start it.  */
> -   if (pass->tv_id != TV_NONE)
> -     timevar_push (pass->tv_id);
> - 
>     /* Do it!  */
>     todo_after = pass->execute (cfun);
>   
>     if (todo_after & TODO_discard_function)
>       {
>         pass_fini_dump_file (pass);
>   
>         gcc_assert (cfun);
> --- 2328,2342 ----
>       do_per_function (verify_curr_properties,
>                      (void *)(size_t)pass->properties_required);
>   
>     /* Do it!  */
>     todo_after = pass->execute (cfun);
>   
>     if (todo_after & TODO_discard_function)
>       {
> +       /* Stop timevar.  */
> +       if (pass->tv_id != TV_NONE)
> +       timevar_pop (pass->tv_id);
> + 
>         pass_fini_dump_file (pass);
>   
>         gcc_assert (cfun);
> *************** execute_one_pass (opt_pass *pass)
> *** 2363,2372 ****
>   
>     do_per_function (clear_last_verified, NULL);
>   
> -   /* Stop timevar.  */
> -   if (pass->tv_id != TV_NONE)
> -     timevar_pop (pass->tv_id);
> - 
>     do_per_function (update_properties_after_pass, pass);
>   
>     if (profile_report && cfun && (cfun->curr_properties & PROP_cfg))
> --- 2363,2368 ----
> *************** execute_one_pass (opt_pass *pass)
> *** 2378,2383 ****
> --- 2374,2384 ----
>       check_profile_consistency (pass->static_pass_number, 1, true);
>   
>     verify_interpass_invariants ();
> + 
> +   /* Stop timevar.  */
> +   if (pass->tv_id != TV_NONE)
> +     timevar_pop (pass->tv_id);
> + 
>     if (pass->type == IPA_PASS
>         && ((ipa_opt_pass_d *)pass)->function_transform)
>       {
> Index: gcc/timevar.def
> ===================================================================
> *** gcc/timevar.def     (revision 238469)
> --- gcc/timevar.def     (working copy)
> *************** DEFTIMEVAR (TV_POSTRELOAD            , "unacco
> *** 290,296 ****
>   DEFTIMEVAR (TV_LATE_COMPILATION            , "unaccounted late
> compilation")
>   DEFTIMEVAR (TV_REMOVE_UNUSED       , "remove unused locals")
>   DEFTIMEVAR (TV_ADDRESS_TAKEN       , "address taken")
> - DEFTIMEVAR (TV_TODO                , "unaccounted todo")
>   DEFTIMEVAR (TV_VERIFY_LOOP_CLOSED    , "verify loop closed")
>   DEFTIMEVAR (TV_VERIFY_RTL_SHARING    , "verify RTL sharing")
>   DEFTIMEVAR (TV_REBUILD_FREQUENCIES   , "rebuild frequencies")
> --- 290,295 ----


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]