This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, devirtualization] Intraprocedural devirtualization pass


Hi,

On Wed, Nov 02, 2011 at 11:02:48AM +0100, Richard Guenther wrote:
> On Tue, Nov 1, 2011 at 11:06 PM, Martin Jambor <mjambor@suse.cz> wrote:
> > Hi,
> >
> > the patch below is the second (and last) revived type-based
> > devirtualization patch that did not make it to 4.6. ÂIt deals with
> > virtual calls from the function in which the there is also the object
> > declaration:
> >
> > void foo()
> > {
> > ÂA a;
> >
> > Âa.foo ();
> > }
> >
> > Normally, the front-end would do the devirtualization on its own,
> > however, early-inlining can create these situations too. ÂSince there
> > is nothing interprocedural going on, the current inlining and IPA-CP
> > devirtualization bits are of no help. ÂWe do not do type-based
> > devirtualization in OBJ_TYPE_REF folding either, because the necessary
> > type-changing checks might make it quite expensive.
> 
> But in the above case - a is not address-taken - we do not need
> type-based devirtualization.  So, can you explain why we do not
> forward the proper vtable from the vtable store that would lead to
> the known-type info?
> 

We can only do that when the object constructor is early-inlined.  If
it is only slightly bigger, or in a different compilation unit, the
store ends up in a different function (or unit) and we cannot do it.
LTO is not necessary with this approach and we still can be quite
helpful if foo is in the same TU and we can inline it.

Martin


> Thanks,
> Richard.
> 
> > ÂThus, this patch
> > introduces a new pass to do that.
> >
> > The patch basically piggy-tails on the intraprocedural
> > devirtualization mechanism, trying to construct a known-type jump
> > function for all objects in OBJ_TYPE_REF calls and then immediately
> > using it like we do in IPA-CP.
> >
> > The original patch was doing this as a part of
> > pass_rebuild_cgraph_edges. ÂHonza did not like this idea so I made it
> > a separate pass. ÂFirst, I scheduled it after
> > pass_rebuild_cgraph_edges and was only traversing indirect edges,
> > avoiding a sweep over all of the IL. ÂUnfortunately, this does not
> > work in one scenario. ÂIf the newly-known destination of a virtual
> > call is known not to throw, we may have to purge some EH CFG edges and
> > potentially basic blocks. ÂIf these basic blocks contain calls
> > (typically calls to object destructors), we may end up having stale
> > call edges in the call graph... and our current approach to that
> > problem is to call pass_rebuild_cgraph_edges. ÂI think that I was not
> > running into this problem when the mechanism was a part of that pass
> > just because of pure luck. ÂAnyway, this is why I eventually opted for
> > a sweep over the statements.
> >
> > My best guess is that it is probably worth it, but only because the
> > overhead should be still fairly low. ÂThe pass triggers quite a number
> > of times when building libstdc++ and it can speed up an artificial
> > testcase which I will attach from over 20 seconds to 7s on my older
> > desktop - it is very similar to the one I posted with the previous
> > patch but this time the object constructors must not get early inlined
> > but the function multiply_matrices has to. ÂCurrently I have problems
> > compiling Firefox even without LTO so I don't have any numbers from it
> > either. ÂIIRC, Honza did not see this patch trigger there when he
> > tried the ancient version almost a year go. ÂOn the other hand, Maxim
> > claimed that the impact can be noticeable on some code base he is
> > concerned about.
> >
> > I have successfully bootstrapped and tested the patch on x86_64-linux.
> > What do you think, should we include this in trunk?
> >
> > Thanks,
> >
> > Martin
> >
> >
> > 2011-10-31 ÂMartin Jambor Â<mjambor@suse.cz>
> >
> > Â Â Â Â* ipa-cp.c (ipa_value_from_known_type_jfunc): Moved to...
> > Â Â Â Â* ipa-prop.c (ipa_binfo_from_known_type_jfunc): ...here, exported and
> > Â Â Â Âupdated all callers.
> > Â Â Â Â(intraprocedural_devirtualization): New function.
> > Â Â Â Â(gate_intra_devirtualization): Likewise.
> > Â Â Â Â(pass_intra_devirt): New pass.
> > Â Â Â Â* ipa-prop.h (ipa_binfo_from_known_type_jfunc): Declared.
> > Â Â Â Â* passes.c (init_optimization_passes): Schedule pass_intra_devirt.
> > Â Â Â Â* tree-pass.h (pass_intra_devirt): Declare.
> >
> > Â Â Â Â* testsuite/g++.dg/ipa/imm-devirt-1.C: New test.
> > Â Â Â Â* testsuite/g++.dg/ipa/imm-devirt-2.C: Likewise.
> >
> >
> > Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C
> > ===================================================================
> > --- /dev/null
> > +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C
> > @@ -0,0 +1,62 @@
> > +/* Verify that virtual calls are folded even when a typecast to an
> > + Â ancestor is involved along the way. Â*/
> > +/* { dg-do run } */
> > +/* { dg-options "-O2 -fdump-tree-devirt" Â} */
> > +
> > +extern "C" void abort (void);
> > +
> > +class A
> > +{
> > +public:
> > + Âint data;
> > + Âvirtual int foo (int i);
> > +};
> > +
> > +
> > +class B : public A
> > +{
> > +public:
> > + Â__attribute__ ((noinline)) B();
> > + Âvirtual int foo (int i);
> > +};
> > +
> > +int __attribute__ ((noinline)) A::foo (int i)
> > +{
> > + Âreturn i + 1;
> > +}
> > +
> > +int __attribute__ ((noinline)) B::foo (int i)
> > +{
> > + Âreturn i + 2;
> > +}
> > +
> > +int __attribute__ ((noinline,noclone)) get_input(void)
> > +{
> > + Âreturn 1;
> > +}
> > +
> > +__attribute__ ((noinline)) B::B()
> > +{
> > +}
> > +
> > +static inline int middleman_1 (class A *obj, int i)
> > +{
> > + Âreturn obj->foo (i);
> > +}
> > +
> > +static inline int middleman_2 (class B *obj, int i)
> > +{
> > + Âreturn middleman_1 (obj, i);
> > +}
> > +
> > +int main (int argc, char *argv[])
> > +{
> > + Âclass B b;
> > +
> > + Âif (middleman_2 (&b, get_input ()) != 3)
> > + Â Âabort ();
> > + Âreturn 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "Immediately devirtualizing call.*into.*B::foo" Â"devirt" Â} } */
> > +/* { dg-final { cleanup-tree-dump "devirt" } } */
> > Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-2.C
> > ===================================================================
> > --- /dev/null
> > +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-2.C
> > @@ -0,0 +1,95 @@
> > +/* Verify that virtual calls are folded even when a typecast to an
> > + Â ancestor is involved along the way. Â*/
> > +/* { dg-do run } */
> > +/* { dg-options "-O2 -fdump-tree-devirt-slim" Â} */
> > +
> > +extern "C" void abort (void);
> > +
> > +class Distraction
> > +{
> > +public:
> > + Âfloat f;
> > + Âdouble d;
> > + ÂDistraction ()
> > + Â{
> > + Â Âf = 8.3;
> > + Â Âd = 10.2;
> > + Â}
> > + Âvirtual float bar (float z);
> > +};
> > +
> > +class A
> > +{
> > +public:
> > + Âint data;
> > + Âvirtual int foo (int i);
> > +};
> > +
> > +class B : public A
> > +{
> > +public:
> > + Âint data_2;
> > + Âvirtual int foo (int i);
> > + Âvirtual int baz (int i);
> > +};
> > +
> > +
> > +class C : public Distraction, public B
> > +{
> > +public:
> > + Â__attribute__ ((noinline)) C();
> > + Âvirtual int foo (int i);
> > +};
> > +
> > +float __attribute__ ((noinline)) Distraction::bar (float z)
> > +{
> > + Âf += z;
> > + Âreturn f/2;
> > +}
> > +
> > +int __attribute__ ((noinline)) A::foo (int i)
> > +{
> > + Âreturn i + 1;
> > +}
> > +
> > +int __attribute__ ((noinline)) B::foo (int i)
> > +{
> > + Âreturn i + 2;
> > +}
> > +
> > +int __attribute__ ((noinline)) B::baz (int i)
> > +{
> > + Âreturn i * 15;
> > +}
> > +
> > +int __attribute__ ((noinline)) C::foo (int i)
> > +{
> > + Âreturn i + 3;
> > +}
> > +
> > +int __attribute__ ((noinline,noclone)) get_input(void)
> > +{
> > + Âreturn 1;
> > +}
> > +
> > +static inline int middleman (class A *obj, int i)
> > +{
> > + Âreturn obj->foo (i);
> > +}
> > +
> > +__attribute__ ((noinline)) C::C()
> > +{
> > +}
> > +
> > +int main (int argc, char *argv[])
> > +{
> > + Âclass C c;
> > +
> > + Âif (middleman (&c, get_input ()) != 4)
> > + Â Âabort ();
> > +
> > + Âreturn 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "Immediately devirtualizing call.*into.*C::.*foo" Â"devirt" Â} } */
> > +/* { dg-final { cleanup-tree-dump "devirt" } } */
> > Index: src/gcc/ipa-cp.c
> > ===================================================================
> > --- src.orig/gcc/ipa-cp.c
> > +++ src/gcc/ipa-cp.c
> > @@ -674,20 +674,6 @@ ipa_get_jf_ancestor_result (struct ipa_j
> > Â Â return NULL_TREE;
> > Â}
> >
> > -/* Extract the acual BINFO being described by JFUNC which must be a known type
> > - Â jump function. Â*/
> > -
> > -static tree
> > -ipa_value_from_known_type_jfunc (struct ipa_jump_func *jfunc)
> > -{
> > - Âtree base_binfo = TYPE_BINFO (jfunc->value.known_type.base_type);
> > - Âif (!base_binfo)
> > - Â Âreturn NULL_TREE;
> > - Âreturn get_binfo_at_offset (base_binfo,
> > - Â Â Â Â Â Â Â Â Â Â Â Â Â Â jfunc->value.known_type.offset,
> > - Â Â Â Â Â Â Â Â Â Â Â Â Â Â jfunc->value.known_type.component_type);
> > -}
> > -
> > Â/* Determine whether JFUNC evaluates to a known value (that is either a
> > Â Âconstant or a binfo) and if so, return it. ÂOtherwise return NULL. INFO
> > Â Âdescribes the caller node so that pass-through jump functions can be
> > @@ -699,7 +685,7 @@ ipa_value_from_jfunc (struct ipa_node_pa
> > Â if (jfunc->type == IPA_JF_CONST)
> > Â Â return jfunc->value.constant;
> > Â else if (jfunc->type == IPA_JF_KNOWN_TYPE)
> > - Â Âreturn ipa_value_from_known_type_jfunc (jfunc);
> > + Â Âreturn ipa_binfo_from_known_type_jfunc (jfunc);
> > Â else if (jfunc->type == IPA_JF_PASS_THROUGH
> > Â Â Â Â Â || jfunc->type == IPA_JF_ANCESTOR)
> > Â Â {
> > @@ -1006,7 +992,7 @@ propagate_accross_jump_function (struct
> >
> > Â Â Â if (jfunc->type == IPA_JF_KNOWN_TYPE)
> > Â Â Â Â{
> > - Â Â Â Â val = ipa_value_from_known_type_jfunc (jfunc);
> > + Â Â Â Â val = ipa_binfo_from_known_type_jfunc (jfunc);
> > Â Â Â Â Âif (!val)
> > Â Â Â Â Â Âreturn set_lattice_contains_variable (dest_lat);
> > Â Â Â Â}
> > Index: src/gcc/ipa-prop.c
> > ===================================================================
> > --- src.orig/gcc/ipa-prop.c
> > +++ src/gcc/ipa-prop.c
> > @@ -3069,3 +3069,107 @@ ipa_update_after_lto_read (void)
> > Â Â if (node->analyzed)
> > Â Â Â ipa_initialize_node_params (node);
> > Â}
> > +
> > +/* Extract the acual BINFO being described by JFUNC which must be a known type
> > + Â jump function. Â*/
> > +
> > +tree
> > +ipa_binfo_from_known_type_jfunc (struct ipa_jump_func *jfunc)
> > +{
> > + Âtree base_binfo = TYPE_BINFO (jfunc->value.known_type.base_type);
> > + Âif (!base_binfo)
> > + Â Âreturn NULL_TREE;
> > + Âreturn get_binfo_at_offset (base_binfo,
> > + Â Â Â Â Â Â Â Â Â Â Â Â Â Â jfunc->value.known_type.offset,
> > + Â Â Â Â Â Â Â Â Â Â Â Â Â Â jfunc->value.known_type.component_type);
> > +}
> > +
> > +/* Intraprocedural (early) type-based devirtualization pass. ÂLooks at all call
> > + Â statements and attempts type-base devirtualization on those calling an
> > + Â OBJ_TYPE_REF. Â*/
> > +
> > +static unsigned int
> > +intraprocedural_devirtualization (void)
> > +{
> > + Âbasic_block bb;
> > + Âgimple_stmt_iterator gsi;
> > + Âbool cfg_changed = false;
> > +
> > + ÂFOR_EACH_BB (bb)
> > + Â Âfor (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > + Â Â Âif (is_gimple_call (gsi_stmt (gsi)))
> > + Â Â Â {
> > + Â Â Â Â tree binfo, token, fndecl;
> > + Â Â Â Â gimple call = gsi_stmt (gsi);
> > + Â Â Â Â tree target = gimple_call_fn (call);
> > + Â Â Â Â struct ipa_jump_func jfunc;
> > +
> > + Â Â Â Â if (TREE_CODE (target) != OBJ_TYPE_REF)
> > + Â Â Â Â Â continue;
> > +
> > + Â Â Â Â jfunc.type = IPA_JF_UNKNOWN;
> > + Â Â Â Â compute_known_type_jump_func (OBJ_TYPE_REF_OBJECT (target), &jfunc,
> > + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â call);
> > + Â Â Â Â if (jfunc.type != IPA_JF_KNOWN_TYPE)
> > + Â Â Â Â Â continue;
> > + Â Â Â Â binfo = ipa_binfo_from_known_type_jfunc (&jfunc);
> > + Â Â Â Â if (!binfo)
> > + Â Â Â Â Â continue;
> > + Â Â Â Â token = OBJ_TYPE_REF_TOKEN (target);
> > + Â Â Â Â fndecl = gimple_get_virt_method_for_binfo (tree_low_cst (token, 1),
> > + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âbinfo);
> > + Â Â Â Â if (!fndecl)
> > + Â Â Â Â Â continue;
> > +
> > + Â Â Â Â if (dump_file)
> > + Â Â Â Â Â {
> > + Â Â Â Â Â Â fprintf (dump_file, "ipa-prop: Immediately devirtualizing call ");
> > + Â Â Â Â Â Â print_gimple_expr (dump_file, call, 0, 0);
> > + Â Â Â Â Â }
> > +
> > + Â Â Â Â gimple_call_set_fndecl (call, fndecl);
> > + Â Â Â Â update_stmt (call);
> > + Â Â Â Â if (maybe_clean_eh_stmt (call)
> > + Â Â Â Â Â Â && gimple_purge_dead_eh_edges (bb))
> > + Â Â Â Â Â cfg_changed = true;
> > +
> > + Â Â Â Â if (dump_file)
> > + Â Â Â Â Â {
> > + Â Â Â Â Â Â fprintf (dump_file, " into ");
> > + Â Â Â Â Â Â print_gimple_expr (dump_file, call, 0, 0);
> > + Â Â Â Â Â Â fprintf (dump_file, "\n");
> > + Â Â Â Â Â }
> > + Â Â Â }
> > +
> > +
> > + if (cfg_changed)
> > + Â Âreturn TODO_cleanup_cfg;
> > + Âelse
> > + Â Âreturn 0;
> > +}
> > +
> > +static bool
> > +gate_intra_devirtualization (void)
> > +{
> > + Âreturn flag_devirtualize != 0;
> > +}
> > +
> > +
> > +struct gimple_opt_pass pass_intra_devirt =
> > + Â{
> > + Â Â{
> > + Â Â ÂGIMPLE_PASS,
> > + Â Â Â"devirt", Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â/* name */
> > + Â Â Âgate_intra_devirtualization, Â Â Â Â Â Â /* gate */
> > + Â Â Âintraprocedural_devirtualization, Â Â Â Â/* execute */
> > + Â Â ÂNULL, Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â/* sub */
> > + Â Â ÂNULL, Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â/* next */
> > + Â Â Â0, Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /* static_pass_number */
> > + Â Â ÂTV_CGRAPH, Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /* tv_id */
> > + Â Â ÂPROP_cfg, Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â/* properties_required */
> > + Â Â Â0, Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /* properties_provided */
> > + Â Â Â0, Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /* properties_destroyed */
> > + Â Â Â0, Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /* todo_flags_start */
> > + Â Â Â0, Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /* todo_flags_finish */
> > + Â Â}
> > + Â};
> > Index: src/gcc/ipa-prop.h
> > ===================================================================
> > --- src.orig/gcc/ipa-prop.h
> > +++ src/gcc/ipa-prop.h
> > @@ -347,6 +347,7 @@ bool ipa_propagate_indirect_call_infos (
> >
> > Â/* Indirect edge and binfo processing. Â*/
> > Âstruct cgraph_edge *ipa_make_edge_direct_to_target (struct cgraph_edge *, tree);
> > +tree ipa_binfo_from_known_type_jfunc (struct ipa_jump_func *);
> >
> > Â/* Functions related to both. Â*/
> > Âvoid ipa_analyze_node (struct cgraph_node *);
> > Index: src/gcc/passes.c
> > ===================================================================
> > --- src.orig/gcc/passes.c
> > +++ src/gcc/passes.c
> > @@ -1225,6 +1225,7 @@ init_optimization_passes (void)
> > Â Â Â Â Â NEXT_PASS (pass_cleanup_eh);
> > Â Â Â Â Â NEXT_PASS (pass_profile);
> > Â Â Â Â Â NEXT_PASS (pass_local_pure_const);
> > + Â Â Â Â NEXT_PASS (pass_intra_devirt);
> > Â Â Â Â Â/* Split functions creates parts that are not run through
> > Â Â Â Â Â Â early optimizations again. ÂIt is thus good idea to do this
> > Â Â Â Â Â Â late. Â*/
> > Index: src/gcc/tree-pass.h
> > ===================================================================
> > --- src.orig/gcc/tree-pass.h
> > +++ src/gcc/tree-pass.h
> > @@ -449,6 +449,7 @@ extern struct gimple_opt_pass pass_trace
> > Âextern struct gimple_opt_pass pass_warn_unused_result;
> > Âextern struct gimple_opt_pass pass_split_functions;
> > Âextern struct gimple_opt_pass pass_feedback_split_functions;
> > +extern struct gimple_opt_pass pass_intra_devirt;
> >
> > Â/* IPA Passes */
> > Âextern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
> >


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]