This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH AutoFDO]Restoring indirect call value profile transformation

From: Andi Kleen <ak at linux dot intel dot com>
To: "Bin.Cheng" <amker dot cheng at gmail dot com>
Cc: bin dot cheng at linux dot alibaba dot com, gcc-patches List <gcc-patches at gcc dot gnu dot org>
Date: Tue, 18 Dec 2018 19:58:34 -0800
Subject: Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
References: <ae294d36-bc14-4407-8ce0-2f01fc3f651c.bin.cheng@linux.alibaba.com> <87wooai8cs.fsf@linux.intel.com> <CAHFci28x7Gt7=0mViO_PqzN0RPqPho69SER94LEzKoCa8HovRA@mail.gmail.com> <20181218212736.GH25620@tassilo.jf.intel.com> <CAHFci29WPN9Ey3X89WSq9uauikop8n9OG7dYFOxJyqM-QzPEmQ@mail.gmail.com>

On Wed, Dec 19, 2018 at 09:26:51AM +0800, Bin.Cheng wrote:
> On Wed, Dec 19, 2018 at 5:27 AM Andi Kleen <ak@linux.intel.com> wrote:
> >
> > > Yes, take g++.dg/tree-prof/morefunc.C as an example:
> > > -  int i;
> > > -  for (i = 0; i < 1000; i++)
> > > +  int i, j;
> > > +  for (i = 0; i < 1000000; i++)
> > > +    for (j = 0; j < 50; j++)
> > >       g += tc->foo();
> > >     if (g<100) g++;
> > >  }
> > > @@ -27,8 +28,9 @@ void test1 (A *tc)
> > >  static __attribute__((always_inline))
> > >  void test2 (B *tc)
> > >  {
> > > -  int i;
> > > +  int i, j;
> > >    for (i = 0; i < 1000000; i++)
> > > +    for (j = 0; j < 50; j++)
> > >
> > > I have to increase loop count like this to get stable pass on my
> > > machine.  The original count (1000) is too small to be sampled.
> >
> > IIRC It was originally higher, but people running on slow simulators complained,
> > so it was reduced.  Perhaps we need some way to detect in the test suite
> > that the test runs on a real CPU.
> Is there concise way to do this, given gcc may be run on all kinds of
> virtual scenarios?

Virtual should be fine too, just simulators are too slow.

I hope there is, because we certainly need a solution for production
ready autofdo.

Or perhaps could just check if perf is working and only
run the tests if that is true. The TCL code already
checks that. Just would need to pass that information
somehow as a define.

Overall I suspect far more test coverage is needed
to make it solid. The existing tests are not that great.

> 
> >
> > >
> > > > > FYI, an update about AutoFDO status:
> > > > > All AutoFDO ICEs in regtest are fixed, while several tests still failing fall in below
> > > > > three categories:
> > > >
> > > > Great!
> > > >
> > > > Of course it still ICEs with LTO?
> > > >
> > > > Right now there is no test case for this I think. Probably one should be added.
> >
> >
> > Any comments on this?
> We'd like to further investigate AutoFDO+LTO, may I ask what the
> status is (or was)?  Any background elaboration about this would be
> appreciated.

It just never worked and ICEs very quickly if you try it.  

There's an open PR (PR71672)

There are some other open issues with autofdo BTW, e.g. the
old 4.9 google branch still has more features than mainline.
For example it supported discriminators, so can distinguish more
than one basic block per source line.

The last time I tested the gains with mainline autofdo
were also significantly less than 4.9-google, so there might
be other tunings missing.

-Andi

References:
- [PATCH AutoFDO]Restoring indirect call value profile transformation
  - From: bin.cheng
- Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  - From: Andi Kleen
- Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  - From: Bin.Cheng
- Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  - From: Andi Kleen
- Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  - From: Bin.Cheng

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]