This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

PR tree-optimize/49373 (IPA-PTA regression)


Hi,
this patch moves ipa-pta into new ipa pass queue of simple IPA passes executed
after regular IPA passes. The reason is that IPA-PTA is really implemented as
simple IPA pass (i.e. it looks into function bodies at its propagate stage and
does not support WHOPR mode) and I planned having place for such passes for a
while.  Until now, there has not been a reason however.

The patch fixes regression I introduced by my alias reorg that triggered latent
problem in IPA-PTA not really expecting to see cgraph with redirected edges.
In longer term we want full IPA IPA-PTA, but we are simply not there, yet.
Having place for late small IPA passes is however convenient for other reasons:
we can remove functions whose references are optimized out or we can re-do
some of early optimizations, inlcluding ipa-sra, late that might be interesting
for LTO.

The change needed quite a bit more unentaging of the old code than I would like
and I was not fully succesful on it.  The reason is that until now we
executed the transform stage of IPA passes all just before the first local
pass of late copmilation (i.e. all_passes) is done.

With this patch the ipa-transforms can take place either at beggining of all_passes
(when all late IPA passes are disabled) or just before first late IPA pass.

Original motivation for applying the transforms all at the time of late
compilation was the "half-WHOPR" compilation model I developed originally
cgraph for in 2003-2006.  The idea was that IPA passes will have function body
summaries, just like in our curent WHOPR implementation, but I did not intended
to implement the second streamping (WPA->LTRANS). Parallel compilation
was not that much of concern at that time. I simply expected the compiler to
produce final assembly in the same process as running WPA, but preventing a
need to load all function bodies into memory at once.  The late copilation
was expected to load function bodies one by one, optimize them and output to
assembly (modulo preloading of inlined functions).

With WHOPR this is not really so important (while it is theoretically possible
with -flto -flto-partition=none, just not implemented this way: lto.c proactively
loads all function bodies at early stages of LTO compilation).

This "half-WHOPR" makes such late optimization passes impossible. WHOPR solves
the problem by restricting late optimization passes to a parttions and thus makes
late IPA passes resonable.

Note that the trick reduces memory usage even w/o LTO because program after inline
decisions are applied is bigger then before.  Currently we do not apply inline
decisions "unit at a time".  This growth is however more or less bounded, since
inliner should not expand unit more then by inline-unit-growth limit.

For these reasons I don't really want to move applying of ipa transforms into
"unit at a time" by default, until we have compelling reasons to do so (i.e.
by default enabled late ipa pass that pays back for itself).

As a result I have bit of problem with cfg fixup:  cfg fixup is needed after
IPA passes because ipa-pure-const can turn functions to non-throwing pure or
const and those needs compensation at caller side that can't be done by proper
IPA pass.  We also need it at the beggining of all_passes because RTL code and
local-pure-const can do the same (i.e. turn functions to non-EH/pure/const).

Because we run cfg verifiers in between ipa transforms, we now need to run fixup
one extra time: once after inlining and once at beggining of all_passes.

I guess this is passmanager job to bookkeep this, but the passmanager is currently
bit too inflexible since all its properties are static.

There are laternatives, like fixing up cfg from the late pure const and RTL EH
code, but they seem just as ugly as one extra pass through the statements.

The patch also arranged cgraph to be valid after ipa transforms in the case some
late IPA pass is run.  This is done by simply rebuilding cgraph edges since we
do not preserve them through inline transform (it does cleanup_cfg and also we do
not really maintain ipa references). 

Finally we also now can disable ipa-inline at -O0.

I've bootstrapped/regtested x86_64-linux and verified that it fixed the regression.
I've also bootstrapped with ipa-pta enabled by default with c,c++ and fortran.
Libjava copmiles forever with ipa-pta.  There are some units needing about 3 hours
to complete.

OK?
Honza

	PR tree-optimize/49373
	* tree-pass.h (all_late_ipa_passes): Declare.
	* cgraphunit.c (init_lowered_empty_function): Fix properties.
	(cgraph_optimize): Execute late passes; remove unreachable funcions after
	materialization.
	* ipa-inline.c (gate_ipa_inline): Enable only when optimizing or LTOing.
	* passes.c (all_late_ipa_passes): Declare.
	(dump_passes, register_pass): Handle late ipa passes.
	(init_optimization_passes): Move ipa_pta to late passes; schedule fixup_cfg
	at beggining of all_passes.
	(apply_ipa_transforms): New function.
	(execute_one_pass): When doing simple ipa pass, apply all transforms.
Index: tree-pass.h
===================================================================
*** tree-pass.h	(revision 175293)
--- tree-pass.h	(working copy)
*************** extern struct gimple_opt_pass pass_conve
*** 577,583 ****
  
  /* The root of the compilation pass tree, once constructed.  */
  extern struct opt_pass *all_passes, *all_small_ipa_passes, *all_lowering_passes,
!                        *all_regular_ipa_passes, *all_lto_gen_passes;
  
  /* Define a list of pass lists so that both passes.c and plugins can easily
     find all the pass lists.  */
--- 577,583 ----
  
  /* The root of the compilation pass tree, once constructed.  */
  extern struct opt_pass *all_passes, *all_small_ipa_passes, *all_lowering_passes,
!                        *all_regular_ipa_passes, *all_lto_gen_passes, *all_late_ipa_passes;
  
  /* Define a list of pass lists so that both passes.c and plugins can easily
     find all the pass lists.  */
Index: cgraphunit.c
===================================================================
*** cgraphunit.c	(revision 175293)
--- cgraphunit.c	(working copy)
*************** init_lowered_empty_function (tree decl)
*** 1420,1426 ****
    DECL_SAVED_TREE (decl) = error_mark_node;
    cfun->curr_properties |=
      (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
!      PROP_ssa);
  
    /* Create BB for body of the function and connect it properly.  */
    bb = create_basic_block (NULL, (void *) 0, ENTRY_BLOCK_PTR);
--- 1420,1426 ----
    DECL_SAVED_TREE (decl) = error_mark_node;
    cfun->curr_properties |=
      (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
!      PROP_ssa | PROP_gimple_any);
  
    /* Create BB for body of the function and connect it properly.  */
    bb = create_basic_block (NULL, (void *) 0, ENTRY_BLOCK_PTR);
*************** cgraph_optimize (void)
*** 2101,2106 ****
--- 2101,2113 ----
  #endif
  
    cgraph_materialize_all_clones ();
+   bitmap_obstack_initialize (NULL);
+   execute_ipa_pass_list (all_late_ipa_passes);
+   cgraph_remove_unreachable_nodes (true, dump_file);
+ #ifdef ENABLE_CHECKING
+   verify_cgraph ();
+ #endif
+   bitmap_obstack_release (NULL);
    cgraph_mark_functions_to_output ();
  
    cgraph_state = CGRAPH_STATE_EXPANSION;
Index: ipa-inline.c
===================================================================
*** ipa-inline.c	(revision 175293)
--- ipa-inline.c	(working copy)
*************** struct gimple_opt_pass pass_early_inline
*** 1972,1988 ****
  
  
  /* When to run IPA inlining.  Inlining of always-inline functions
!    happens during early inlining.  */
  
  static bool
  gate_ipa_inline (void)
  {
!   /* ???  We'd like to skip this if not optimizing or not inlining as
!      all always-inline functions have been processed by early
!      inlining already.  But this at least breaks EH with C++ as
!      we need to unconditionally run fixup_cfg even at -O0.
!      So leave it on unconditionally for now.  */
!   return 1;
  }
  
  struct ipa_opt_pass_d pass_ipa_inline =
--- 1972,1986 ----
  
  
  /* When to run IPA inlining.  Inlining of always-inline functions
!    happens during early inlining.
! 
!    Enable inlining unconditoinally at -flto.  We need size estimates to
!    drive partitioning.  */
  
  static bool
  gate_ipa_inline (void)
  {
!   return optimize || flag_lto || flag_wpa;
  }
  
  struct ipa_opt_pass_d pass_ipa_inline =
Index: passes.c
===================================================================
*** passes.c	(revision 175293)
--- passes.c	(working copy)
*************** struct rtl_opt_pass pass_postreload =
*** 332,338 ****
  
  /* The root of the compilation pass tree, once constructed.  */
  struct opt_pass *all_passes, *all_small_ipa_passes, *all_lowering_passes,
!   *all_regular_ipa_passes, *all_lto_gen_passes;
  
  /* This is used by plugins, and should also be used in register_pass.  */
  #define DEF_PASS_LIST(LIST) &LIST,
--- 332,338 ----
  
  /* The root of the compilation pass tree, once constructed.  */
  struct opt_pass *all_passes, *all_small_ipa_passes, *all_lowering_passes,
!   *all_regular_ipa_passes, *all_late_ipa_passes, *all_lto_gen_passes;
  
  /* This is used by plugins, and should also be used in register_pass.  */
  #define DEF_PASS_LIST(LIST) &LIST,
*************** dump_passes (void)
*** 617,622 ****
--- 617,623 ----
    dump_pass_list (all_small_ipa_passes, 1);
    dump_pass_list (all_regular_ipa_passes, 1);
    dump_pass_list (all_lto_gen_passes, 1);
+   dump_pass_list (all_late_ipa_passes, 1);
    dump_pass_list (all_passes, 1);
  
    pop_cfun ();
*************** register_pass (struct register_pass_info
*** 1103,1108 ****
--- 1104,1111 ----
    if (!success || all_instances)
      success |= position_pass (pass_info, &all_lto_gen_passes);
    if (!success || all_instances)
+     success |= position_pass (pass_info, &all_late_ipa_passes);
+   if (!success || all_instances)
      success |= position_pass (pass_info, &all_passes);
    if (!success)
      fatal_error
*************** init_optimization_passes (void)
*** 1249,1255 ****
    NEXT_PASS (pass_ipa_inline);
    NEXT_PASS (pass_ipa_pure_const);
    NEXT_PASS (pass_ipa_reference);
-   NEXT_PASS (pass_ipa_pta);
    *p = NULL;
  
    p = &all_lto_gen_passes;
--- 1252,1257 ----
*************** init_optimization_passes (void)
*** 1257,1265 ****
--- 1259,1274 ----
    NEXT_PASS (pass_ipa_lto_finish_out);  /* This must be the last LTO pass.  */
    *p = NULL;
  
+   /* Simple IPA passes executed after the regular passes.  In WHOPR mode the
+      passes are executed after partitioning and thus see just parts of the
+      compiled unit.  */
+   p = &all_late_ipa_passes;
+   NEXT_PASS (pass_ipa_pta);
+   *p = NULL;
    /* These passes are run after IPA passes on every function that is being
       output to the assembler file.  */
    p = &all_passes;
+   NEXT_PASS (pass_fixup_cfg);
    NEXT_PASS (pass_lower_eh_dispatch);
    NEXT_PASS (pass_all_optimizations);
      {
*************** init_optimization_passes (void)
*** 1517,1522 ****
--- 1526,1534 ----
    register_dump_files (all_lto_gen_passes,
  		       PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh
  		       | PROP_cfg);
+   register_dump_files (all_late_ipa_passes,
+ 		       PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh
+ 		       | PROP_cfg);
    register_dump_files (all_passes,
  		       PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh
  		       | PROP_cfg);
*************** execute_all_ipa_transforms (void)
*** 1935,1940 ****
--- 1947,1966 ----
      }
  }
  
+ /* Callback for do_per_function to apply all IPA transforms.  */
+ 
+ static void
+ apply_ipa_transforms (void *data)
+ {
+   struct cgraph_node *node = cgraph_get_node (current_function_decl);
+   if (!node->global.inlined_to && node->ipa_transforms_to_apply)
+     {
+       *(bool *)data = true;
+       execute_all_ipa_transforms();
+       rebuild_cgraph_edges ();
+     }
+ }
+ 
  /* Check if PASS is explicitly disabled or enabled and return
     the gate status.  FUNC is the function to be processed, and
     GATE_STATUS is the gate status determined by pass manager by
*************** execute_one_pass (struct opt_pass *pass)
*** 1996,2001 ****
--- 2022,2037 ----
       executed.  */
    invoke_plugin_callbacks (PLUGIN_PASS_EXECUTION, pass);
  
+   /* SIPLE IPA passes do not handle callgraphs with IPA transforms in it.
+      Apply all trnasforms first.  */
+   if (pass->type == SIMPLE_IPA_PASS)
+     {
+       bool applied = false;
+       do_per_function (apply_ipa_transforms, (void *)&applied);
+       if (applied)
+         cgraph_remove_unreachable_nodes (true, dump_file);
+     }
+ 
    if (!quiet_flag && !cfun)
      fprintf (stderr, " <%s>", pass->name ? pass->name : "");
  


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]