This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Fix always_inline
- From: Jan Hubicka <jh at suse dot cz>
- To: Richard Guenther <richard dot guenther at gmail dot com>
- Cc: Jan Hubicka <hubicka at ucw dot cz>, Jan Hubicka <jh at suse dot cz>, Diego Novillo <dnovillo at google dot com>, gcc-patches at gcc dot gnu dot org
- Date: Mon, 16 Jun 2008 00:51:47 +0200
- Subject: Re: Fix always_inline
- References: <20080505053024.GG334@kam.mff.cuni.cz> <84fc9c000805050059i199f7f19v3943fb62fa5ba4ad@mail.gmail.com> <20080505080643.GI334@kam.mff.cuni.cz> <4825B804.9030405@google.com> <84fc9c000805101223h4af46949le20612af4b735f6c@mail.gmail.com> <20080519193617.GI21172@kam.mff.cuni.cz> <84fc9c000805200229o5f8e039bmfdf56a01a0682e40@mail.gmail.com> <20080528131518.GA24006@atrey.karlin.mff.cuni.cz> <84fc9c000806151549w4f4c10c2p6b569b58a7507960@mail.gmail.com>
> On Wed, May 28, 2008 at 9:15 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> On Mon, May 19, 2008 at 9:36 PM, Jan Hubicka <jh@suse.cz> wrote:
> >> > > ISTR that Honza has it working to some extent already.
> >> >
> >> > I am attaching WIP patch (it went quite smoothly). It breaks complex,
> >> > mudflap and other stuff, but it is good enough to bootstrap so we can
> >> > get some idea how things works. Rearranging the mudflap/complex passes
> >> > should be easy enough later.
> >> >
> >> > Compiling combine.c with SSA path slows down compiler from 1.517s to
> >> > 1.636 and increase size of produced assembly somewhat (695k->697k). 7%
> >> > slowdown for combine.c (5% for Gerald's testcase) is something we need
> >> > to consider, but it is not _that_ disasterous. I didn't experimented
> >> > with limited DCE or CCP yet.
> >> > Probably if every statement writting to user variable is marked as
> >> > having side effect, our DCE should be safe.
> >>
> >> The figures are without enabling TER, I would expect both size and
> >> time savings from enabling it.
> >
> > Hi,
> > here is variant of patch that enables limited amount of TER (basically
> > we can't mix statements from various locations or we confuse GDB) and
> > gets clean run on GDB testsuite.
> >
> > I am still testing it, but compile time and code size now actually
> > improves for combine.c and Gerald's testcase. We eat more memory as
> > expected, but the extra ggc runs seems to pay back in performance, so we
> > probably lose noticeably only in case of testcases having single big
> > function where SSA form is expensive....
> >
> > tree-ssa-coalesce change (to make all names comming from single user var
> > coalesced) is a hack. It can be made linear with a hashtable, but
> > perhaps the info is somehow readilly available just I don't know about
> > it.
> >
> > Honza
> >
> > Index: cgraph.c
> > ===================================================================
> > --- cgraph.c (revision 136074)
> > +++ cgraph.c (working copy)
> > @@ -1070,7 +1070,7 @@
> > if (!lowered)
> > tree_lowering_passes (fndecl);
> > bitmap_obstack_initialize (NULL);
> > - if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl)) && optimize)
> > + if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl)))
> > execute_pass_list (pass_early_local_passes.pass.sub);
> > bitmap_obstack_release (NULL);
> > tree_rest_of_compilation (fndecl);
> > Index: tree-ssa-coalesce.c
> > ===================================================================
> > --- tree-ssa-coalesce.c (revision 136074)
> > +++ tree-ssa-coalesce.c (working copy)
> > @@ -1309,13 +1309,39 @@
> > coalesce_list_p cl;
> > bitmap used_in_copies = BITMAP_ALLOC (NULL);
> > var_map map;
> > + int i, j;
> >
> > cl = create_coalesce_list ();
> > map = create_outofssa_var_map (cl, used_in_copies);
> >
> > + /* FIXME: We need to coalesce all names originating same SSA_NAME_VAR.
> > + I am not sure what is proper implementation here. */
>
> At least the loop nest can be somewhat optimized by moving the ssa_name
> reference of i out of the innermost loop, as well as the test for
> DECL_ARTIFICIAL of a.
> The asserts look unneccessary. The extra dumps should also be removed again
> IMHO. For the constant 10000 there is a #define, MUST_COALESCE_COST.
Cool, didn't notice that.
The loop should use hashtable, so it is O(n), I will do that unless
someone points out that the infromation is readilly available somehow.
Honza