[PATCH] ipa-inline: Improve growth accumulation for recursive calls

Thu Jan 21 14:34:59 GMT 2021

Hi All,

James and I have been investigating this regression and have tracked it down to register allocation.

I have create a new PR with our findings https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782 but unfortunately
we don't know how to proceed.

This does seem like a genuine bug in RA.  It looks like some magic threshold has been crossed, but we're having
trouble determining what this magic number is.

Any help is appreciated.

Thanks,
Tamar

> -----Original Message-----
> From: Xionghu Luo <luoxhu@linux.ibm.com>
> Sent: Friday, October 16, 2020 9:47 AM
> To: Tamar Christina <Tamar.Christina@arm.com>; Martin Jambor
> <mjambor@suse.cz>; Richard Sandiford <Richard.Sandiford@arm.com>;
> luoxhu via Gcc-patches <gcc-patches@gcc.gnu.org>
> Cc: segher@kernel.crashing.org; wschmidt@linux.ibm.com;
> linkw@gcc.gnu.org; Jan Hubicka <hubicka@ucw.cz>; dje.gcc@gmail.com
> Subject: Re: [PATCH] ipa-inline: Improve growth accumulation for recursive
> calls
> 
> 
> 
> On 2020/9/12 01:36, Tamar Christina wrote:
> > Hi Martin,
> >
> >>
> >> can you please confirm that the difference between these two is all
> >> due to the last option -fno-inline-functions-called-once ?  Is LTo
> >> necessary?  I.e., can you run the benchmark also built with the
> >> branch compiler and -mcpu=native -Ofast -fomit-frame-pointer -fno-
> inline-functions-called-once ?
> >>
> >
> > Done, see below.
> >
> >>> +----------+--------------------------------------------------------
> >>> +----------+----------------------
> >> --------------------------------------------------------------------+--------------+--+-
> -+
> >>> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto
> >> | -24%         |  |  |
> >>> +----------+--------------------------------------------------------
> >>> +----------+----------------------
> >> --------------------------------------------------------------------+--------------+--+-
> -+
> >>> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer
> >> | -26%         |  |  |
> >>> +----------+--------------------------------------------------------
> >>> +----------+----------------------
> >> --------------------------------------------------------------------+--------------+--+-
> -+
> >>
> >>>
> >>> (Hopefully the table shows up correct)
> >>
> >> it does show OK for me, thanks.
> >>
> >>>
> >>> It looks like your patch definitely does improve the basic cases. So
> >>> there's not much difference between lto and non-lto anymore and it's
> >> much Better than GCC 10. However it still contains the regression
> >> introduced by Honza's changes.
> >>
> >> I assume these are rates, not times, so negative means bad.  But do I
> >> understand it correctly that you're comparing against GCC 10 with the
> >> two parameters set to rather special values?  Because your table
> >> seems to indicate that even for you, the branch is faster than GCC 10
> >> with just - mcpu=native -Ofast -fomit-frame-pointer.
> >
> > Yes these are indeed rates, and indeed I am comparing against the same
> > options we used to get the fastest rates on before which is the two
> > parameters and the inline flag.
> >
> >>
> >> So is the problem that the best obtainable run-time, even with
> >> obscure options, from the branch is slower than the best obtainable
> >> run-time from GCC 10?
> >>
> >
> > Yeah that's the problem, when we compare the two we're still behind.
> >
> > I've done the additional two runs
> >
> > +----------+------------------------------------------------------------------------------
> --------------------------------------------------------------------+--------------+
> > | Compiler | Flags
> | diff GCC 10  |
> > +----------+------------------------------------------------------------------------------
> --------------------------------------------------------------------+--------------+
> > | GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp-
> eval-threshold=1 --param   ipa-cp-unit-growth=80 -fno-inline-functions-
> called-once |              |
> > +----------+------------------------------------------------------------------------------
> --------------------------------------------------------------------+--------------+
> > | GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer
> | -44%         |
> > +----------+------------------------------------------------------------------------------
> --------------------------------------------------------------------+--------------+
> > | GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer -flto
> | -36%         |
> > +----------+------------------------------------------------------------------------------
> --------------------------------------------------------------------+--------------+
> > | GCC 11   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp-
> eval-threshold=1 --param   ipa-cp-unit-growth=80 -fno-inline-functions-
> called-once | -12%         |
> > +----------+------------------------------------------------------------------------------
> --------------------------------------------------------------------+--------------+
> > | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp-
> eval-threshold=1 --param   ipa-cp-unit-growth=80                                   | -22%
> |
> > +----------+------------------------------------------------------------------------------
> --------------------------------------------------------------------+--------------+
> > | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp-
> eval-threshold=1 --param   ipa-cp-unit-growth=80 -fno-inline-functions-
> called-once | -12%         |
> > +----------+------------------------------------------------------------------------------
> --------------------------------------------------------------------+--------------+
> > | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto
> | -24%         |
> > +----------+------------------------------------------------------------------------------
> --------------------------------------------------------------------+--------------+
> > | Branch   | -mcpu=native -Ofast -fomit-frame-pointer
> | -26%         |
> > +----------+------------------------------------------------------------------------------
> --------------------------------------------------------------------+--------------+
> > | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto -fno-inline-
> functions-called-once                                                                 | -12%         |
> > +----------+------------------------------------------------------------------------------
> --------------------------------------------------------------------+--------------+
> > | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -fno-inline-
> functions-called-once                                                                       | -11%         |
> > +----------+------------------------------------------------------------------------------
> --------------------------------------------------------------------+--------------+
> >
> > And this confirms that indeed LTO isn't needed and that the branch
> > without any options is indeed much better than it was on GCC 10 without
> any options.
> >
> > It also confirms that the only remaining difference is in the
> > -fno-inline-functions-called-once
> 
> If -fno-inline-functions-called-once is added, the recursive call function
> digits_2 won't be inlined, as each digits_2 is specialized to clone nodes and
> called once only, so performance back is expected, I guess it is somewhat
> similar to -fno-inline for this case.
> 
> @Jambor @Honza Any progress about this (--param controlling maximal
> recursion depth) and the other regression about
> LOOP_GUARD_WITH_PREDICTION in
> PR96825(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96825) please? :) I
> tested the current master FSF code, the regression still exists...
> 
> 
> Thanks,
> Xionghu
>