This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
- From: Andi Kleen <ak at linux dot intel dot com>
- To: "Bin.Cheng" <amker dot cheng at gmail dot com>
- Cc: bin dot cheng at linux dot alibaba dot com, gcc-patches List <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 18 Dec 2018 20:00:35 -0800
- Subject: Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
- References: <ae294d36-bc14-4407-8ce0-2f01fc3f651c.bin.cheng@linux.alibaba.com> <87wooai8cs.fsf@linux.intel.com> <CAHFci28x7Gt7=0mViO_PqzN0RPqPho69SER94LEzKoCa8HovRA@mail.gmail.com> <CAHFci28+4fboT+-+=ainVpbM+ufqt+uoawpS8754fcXsgzORBQ@mail.gmail.com>
On Wed, Dec 19, 2018 at 10:01:15AM +0800, Bin.Cheng wrote:
> On Tue, Dec 18, 2018 at 7:15 PM Bin.Cheng <amker.cheng@gmail.com> wrote:
> >
> > On Sun, Dec 16, 2018 at 9:11 AM Andi Kleen <ak@linux.intel.com> wrote:
> > >
> > > "bin.cheng" <bin.cheng@linux.alibaba.com> writes:
> > >
> > > > Hi,
> > > >
> > > > Due to ICE and mal-functional bugs, indirect call value profile transformation
> > > > is disabled on GCC-7/8/trunk. This patch restores the transformation. The
> > > > main issue is AutoFDO should store cgraph_node's profile_id of callee func in
> > > > the first histogram value's counter, rather than pointer to callee's name string
> > > > as it is now.
> > > > With the patch, some "Indirect call -> direct call" tests pass with autofdo, while
> > > > others are unstable. I think the instability is caused by poor perf data collected
> > > > during regrets run, and can confirm these tests pass if good perf data could be
> > > > collected in manual experiments.
> > >
> > > Would be good to make the tests stable, otherwise we'll just have
> > > regressions in the future again.
> > >
> > > The problem is that the tests don't run long enough and don't get enough samples?
> > Yes, take g++.dg/tree-prof/morefunc.C as an example:
> > - int i;
> > - for (i = 0; i < 1000; i++)
> > + int i, j;
> > + for (i = 0; i < 1000000; i++)
> > + for (j = 0; j < 50; j++)
> > g += tc->foo();
> > if (g<100) g++;
> > }
> > @@ -27,8 +28,9 @@ void test1 (A *tc)
> > static __attribute__((always_inline))
> > void test2 (B *tc)
> > {
> > - int i;
> > + int i, j;
> > for (i = 0; i < 1000000; i++)
> > + for (j = 0; j < 50; j++)
> >
> > I have to increase loop count like this to get stable pass on my
> > machine. The original count (1000) is too small to be sampled.
> >
> > >
> > > Could add some loop?
> > > Or possibly increase the sampling frequency in perf (-F or -c)?
> > Maybe, I will have a try.
> Turned out all "Indirect call" test can be resolved by adding -c 100
> to perf command line:
> diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile
> ...
> -exec perf record -e $E -b "$@"
> +exec perf record -e $E -c 100 -b "$@"
>
> Is 100 too small here? Or is it fine for all scenarios?
-c 100 is risky because it can cause perf throttling, which
makes it lose data.
perf has a limiter that if the PMU handler uses too much CPU
time it stops measuring for some time. A PMI is 10k+ cycles,
so doing one every 100 branches is a lot of CPU time.
I wouldn't go down that low. It is better to increase the
iteration count.
-Andi