PR tree-optimize/49373 (IPA-PTA regression)

Richard Guenther richard.guenther@gmail.com
Thu Jun 23 10:06:00 GMT 2011


On Thu, Jun 23, 2011 at 1:54 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> this patch moves ipa-pta into new ipa pass queue of simple IPA passes executed
> after regular IPA passes. The reason is that IPA-PTA is really implemented as
> simple IPA pass (i.e. it looks into function bodies at its propagate stage and
> does not support WHOPR mode) and I planned having place for such passes for a
> while.  Until now, there has not been a reason however.
>
> The patch fixes regression I introduced by my alias reorg that triggered latent
> problem in IPA-PTA not really expecting to see cgraph with redirected edges.
> In longer term we want full IPA IPA-PTA, but we are simply not there, yet.
> Having place for late small IPA passes is however convenient for other reasons:
> we can remove functions whose references are optimized out or we can re-do
> some of early optimizations, inlcluding ipa-sra, late that might be interesting
> for LTO.
>
> The change needed quite a bit more unentaging of the old code than I would like
> and I was not fully succesful on it.  The reason is that until now we
> executed the transform stage of IPA passes all just before the first local
> pass of late copmilation (i.e. all_passes) is done.
>
> With this patch the ipa-transforms can take place either at beggining of all_passes
> (when all late IPA passes are disabled) or just before first late IPA pass.
>
> Original motivation for applying the transforms all at the time of late
> compilation was the "half-WHOPR" compilation model I developed originally
> cgraph for in 2003-2006.  The idea was that IPA passes will have function body
> summaries, just like in our curent WHOPR implementation, but I did not intended
> to implement the second streamping (WPA->LTRANS). Parallel compilation
> was not that much of concern at that time. I simply expected the compiler to
> produce final assembly in the same process as running WPA, but preventing a
> need to load all function bodies into memory at once.  The late copilation
> was expected to load function bodies one by one, optimize them and output to
> assembly (modulo preloading of inlined functions).
>
> With WHOPR this is not really so important (while it is theoretically possible
> with -flto -flto-partition=none, just not implemented this way: lto.c proactively
> loads all function bodies at early stages of LTO compilation).
>
> This "half-WHOPR" makes such late optimization passes impossible. WHOPR solves
> the problem by restricting late optimization passes to a parttions and thus makes
> late IPA passes resonable.
>
> Note that the trick reduces memory usage even w/o LTO because program after inline
> decisions are applied is bigger then before.  Currently we do not apply inline
> decisions "unit at a time".  This growth is however more or less bounded, since
> inliner should not expand unit more then by inline-unit-growth limit.
>
> For these reasons I don't really want to move applying of ipa transforms into
> "unit at a time" by default, until we have compelling reasons to do so (i.e.
> by default enabled late ipa pass that pays back for itself).
>
> As a result I have bit of problem with cfg fixup:  cfg fixup is needed after
> IPA passes because ipa-pure-const can turn functions to non-throwing pure or
> const and those needs compensation at caller side that can't be done by proper
> IPA pass.  We also need it at the beggining of all_passes because RTL code and
> local-pure-const can do the same (i.e. turn functions to non-EH/pure/const).
>
> Because we run cfg verifiers in between ipa transforms, we now need to run fixup
> one extra time: once after inlining and once at beggining of all_passes.
>
> I guess this is passmanager job to bookkeep this, but the passmanager is currently
> bit too inflexible since all its properties are static.
>
> There are laternatives, like fixing up cfg from the late pure const and RTL EH
> code, but they seem just as ugly as one extra pass through the statements.
>
> The patch also arranged cgraph to be valid after ipa transforms in the case some
> late IPA pass is run.  This is done by simply rebuilding cgraph edges since we
> do not preserve them through inline transform (it does cleanup_cfg and also we do
> not really maintain ipa references).
>
> Finally we also now can disable ipa-inline at -O0.
>
> I've bootstrapped/regtested x86_64-linux and verified that it fixed the regression.
> I've also bootstrapped with ipa-pta enabled by default with c,c++ and fortran.
> Libjava copmiles forever with ipa-pta.  There are some units needing about 3 hours
> to complete.
>
> OK?

Ok, but please change the IPA inline gate to honor flag_no_inline
(thus, (optimize && !flag_no_inline) || flag_lto || flag_wpa).

Thanks for working on this, I'll look to some followup cleanups
for PTA.  Now, when it works on LTRANS units we have to do
some adjustments (like not disable it in opts.c ;)) - do we know
whether a function is only called from within a ltrans unit somehow?

Thanks,
Richard.

> Honza
>
>        PR tree-optimize/49373
>        * tree-pass.h (all_late_ipa_passes): Declare.
>        * cgraphunit.c (init_lowered_empty_function): Fix properties.
>        (cgraph_optimize): Execute late passes; remove unreachable funcions after
>        materialization.
>        * ipa-inline.c (gate_ipa_inline): Enable only when optimizing or LTOing.
>        * passes.c (all_late_ipa_passes): Declare.
>        (dump_passes, register_pass): Handle late ipa passes.
>        (init_optimization_passes): Move ipa_pta to late passes; schedule fixup_cfg
>        at beggining of all_passes.
>        (apply_ipa_transforms): New function.
>        (execute_one_pass): When doing simple ipa pass, apply all transforms.
> Index: tree-pass.h
> ===================================================================
> *** tree-pass.h (revision 175293)
> --- tree-pass.h (working copy)
> *************** extern struct gimple_opt_pass pass_conve
> *** 577,583 ****
>
>  /* The root of the compilation pass tree, once constructed.  */
>  extern struct opt_pass *all_passes, *all_small_ipa_passes, *all_lowering_passes,
> !                        *all_regular_ipa_passes, *all_lto_gen_passes;
>
>  /* Define a list of pass lists so that both passes.c and plugins can easily
>     find all the pass lists.  */
> --- 577,583 ----
>
>  /* The root of the compilation pass tree, once constructed.  */
>  extern struct opt_pass *all_passes, *all_small_ipa_passes, *all_lowering_passes,
> !                        *all_regular_ipa_passes, *all_lto_gen_passes, *all_late_ipa_passes;
>
>  /* Define a list of pass lists so that both passes.c and plugins can easily
>     find all the pass lists.  */
> Index: cgraphunit.c
> ===================================================================
> *** cgraphunit.c        (revision 175293)
> --- cgraphunit.c        (working copy)
> *************** init_lowered_empty_function (tree decl)
> *** 1420,1426 ****
>    DECL_SAVED_TREE (decl) = error_mark_node;
>    cfun->curr_properties |=
>      (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
> !      PROP_ssa);
>
>    /* Create BB for body of the function and connect it properly.  */
>    bb = create_basic_block (NULL, (void *) 0, ENTRY_BLOCK_PTR);
> --- 1420,1426 ----
>    DECL_SAVED_TREE (decl) = error_mark_node;
>    cfun->curr_properties |=
>      (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
> !      PROP_ssa | PROP_gimple_any);
>
>    /* Create BB for body of the function and connect it properly.  */
>    bb = create_basic_block (NULL, (void *) 0, ENTRY_BLOCK_PTR);
> *************** cgraph_optimize (void)
> *** 2101,2106 ****
> --- 2101,2113 ----
>  #endif
>
>    cgraph_materialize_all_clones ();
> +   bitmap_obstack_initialize (NULL);
> +   execute_ipa_pass_list (all_late_ipa_passes);
> +   cgraph_remove_unreachable_nodes (true, dump_file);
> + #ifdef ENABLE_CHECKING
> +   verify_cgraph ();
> + #endif
> +   bitmap_obstack_release (NULL);
>    cgraph_mark_functions_to_output ();
>
>    cgraph_state = CGRAPH_STATE_EXPANSION;
> Index: ipa-inline.c
> ===================================================================
> *** ipa-inline.c        (revision 175293)
> --- ipa-inline.c        (working copy)
> *************** struct gimple_opt_pass pass_early_inline
> *** 1972,1988 ****
>
>
>  /* When to run IPA inlining.  Inlining of always-inline functions
> !    happens during early inlining.  */
>
>  static bool
>  gate_ipa_inline (void)
>  {
> !   /* ???  We'd like to skip this if not optimizing or not inlining as
> !      all always-inline functions have been processed by early
> !      inlining already.  But this at least breaks EH with C++ as
> !      we need to unconditionally run fixup_cfg even at -O0.
> !      So leave it on unconditionally for now.  */
> !   return 1;
>  }
>
>  struct ipa_opt_pass_d pass_ipa_inline =
> --- 1972,1986 ----
>
>
>  /* When to run IPA inlining.  Inlining of always-inline functions
> !    happens during early inlining.
> !
> !    Enable inlining unconditoinally at -flto.  We need size estimates to
> !    drive partitioning.  */
>
>  static bool
>  gate_ipa_inline (void)
>  {
> !   return optimize || flag_lto || flag_wpa;
>  }
>
>  struct ipa_opt_pass_d pass_ipa_inline =
> Index: passes.c
> ===================================================================
> *** passes.c    (revision 175293)
> --- passes.c    (working copy)
> *************** struct rtl_opt_pass pass_postreload =
> *** 332,338 ****
>
>  /* The root of the compilation pass tree, once constructed.  */
>  struct opt_pass *all_passes, *all_small_ipa_passes, *all_lowering_passes,
> !   *all_regular_ipa_passes, *all_lto_gen_passes;
>
>  /* This is used by plugins, and should also be used in register_pass.  */
>  #define DEF_PASS_LIST(LIST) &LIST,
> --- 332,338 ----
>
>  /* The root of the compilation pass tree, once constructed.  */
>  struct opt_pass *all_passes, *all_small_ipa_passes, *all_lowering_passes,
> !   *all_regular_ipa_passes, *all_late_ipa_passes, *all_lto_gen_passes;
>
>  /* This is used by plugins, and should also be used in register_pass.  */
>  #define DEF_PASS_LIST(LIST) &LIST,
> *************** dump_passes (void)
> *** 617,622 ****
> --- 617,623 ----
>    dump_pass_list (all_small_ipa_passes, 1);
>    dump_pass_list (all_regular_ipa_passes, 1);
>    dump_pass_list (all_lto_gen_passes, 1);
> +   dump_pass_list (all_late_ipa_passes, 1);
>    dump_pass_list (all_passes, 1);
>
>    pop_cfun ();
> *************** register_pass (struct register_pass_info
> *** 1103,1108 ****
> --- 1104,1111 ----
>    if (!success || all_instances)
>      success |= position_pass (pass_info, &all_lto_gen_passes);
>    if (!success || all_instances)
> +     success |= position_pass (pass_info, &all_late_ipa_passes);
> +   if (!success || all_instances)
>      success |= position_pass (pass_info, &all_passes);
>    if (!success)
>      fatal_error
> *************** init_optimization_passes (void)
> *** 1249,1255 ****
>    NEXT_PASS (pass_ipa_inline);
>    NEXT_PASS (pass_ipa_pure_const);
>    NEXT_PASS (pass_ipa_reference);
> -   NEXT_PASS (pass_ipa_pta);
>    *p = NULL;
>
>    p = &all_lto_gen_passes;
> --- 1252,1257 ----
> *************** init_optimization_passes (void)
> *** 1257,1265 ****
> --- 1259,1274 ----
>    NEXT_PASS (pass_ipa_lto_finish_out);  /* This must be the last LTO pass.  */
>    *p = NULL;
>
> +   /* Simple IPA passes executed after the regular passes.  In WHOPR mode the
> +      passes are executed after partitioning and thus see just parts of the
> +      compiled unit.  */
> +   p = &all_late_ipa_passes;
> +   NEXT_PASS (pass_ipa_pta);
> +   *p = NULL;
>    /* These passes are run after IPA passes on every function that is being
>       output to the assembler file.  */
>    p = &all_passes;
> +   NEXT_PASS (pass_fixup_cfg);
>    NEXT_PASS (pass_lower_eh_dispatch);
>    NEXT_PASS (pass_all_optimizations);
>      {
> *************** init_optimization_passes (void)
> *** 1517,1522 ****
> --- 1526,1534 ----
>    register_dump_files (all_lto_gen_passes,
>                       PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh
>                       | PROP_cfg);
> +   register_dump_files (all_late_ipa_passes,
> +                      PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh
> +                      | PROP_cfg);
>    register_dump_files (all_passes,
>                       PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh
>                       | PROP_cfg);
> *************** execute_all_ipa_transforms (void)
> *** 1935,1940 ****
> --- 1947,1966 ----
>      }
>  }
>
> + /* Callback for do_per_function to apply all IPA transforms.  */
> +
> + static void
> + apply_ipa_transforms (void *data)
> + {
> +   struct cgraph_node *node = cgraph_get_node (current_function_decl);
> +   if (!node->global.inlined_to && node->ipa_transforms_to_apply)
> +     {
> +       *(bool *)data = true;
> +       execute_all_ipa_transforms();
> +       rebuild_cgraph_edges ();
> +     }
> + }
> +
>  /* Check if PASS is explicitly disabled or enabled and return
>     the gate status.  FUNC is the function to be processed, and
>     GATE_STATUS is the gate status determined by pass manager by
> *************** execute_one_pass (struct opt_pass *pass)
> *** 1996,2001 ****
> --- 2022,2037 ----
>       executed.  */
>    invoke_plugin_callbacks (PLUGIN_PASS_EXECUTION, pass);
>
> +   /* SIPLE IPA passes do not handle callgraphs with IPA transforms in it.
> +      Apply all trnasforms first.  */
> +   if (pass->type == SIMPLE_IPA_PASS)
> +     {
> +       bool applied = false;
> +       do_per_function (apply_ipa_transforms, (void *)&applied);
> +       if (applied)
> +         cgraph_remove_unreachable_nodes (true, dump_file);
> +     }
> +
>    if (!quiet_flag && !cfun)
>      fprintf (stderr, " <%s>", pass->name ? pass->name : "");
>
>



More information about the Gcc-patches mailing list