This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: backward threading heuristics tweek
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: Jan Hubicka <hubicka at ucw dot cz>
- Cc: Andrew Pinski <pinskia at gmail dot com>, Jeff Law <law at redhat dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, <nd at arm dot com>
- Date: Fri, 12 Aug 2016 16:06:56 +0100
- Subject: Re: backward threading heuristics tweek
- Authentication-results: sourceware.org; auth=none
- Nodisclaimer: True
- References: <20160606101953.GC12313@kam.mff.cuni.cz> <CA+=Sn1mXz9ONrYAh_D3NO=YyPxiy7KKe5yjd+gj3G04ch8Hz3Q@mail.gmail.com> <20160811113516.GA67433@kam.mff.cuni.cz>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
On Thu, Aug 11, 2016 at 01:35:16PM +0200, Jan Hubicka wrote:
> > On Mon, Jun 6, 2016 at 3:19 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> > > Hi,
> > > while looking into profile mismatches introduced by the backward threading pass
> > > I noticed that the heuristics seems quite simplistics. First it should be
> > > profile sensitive and disallow duplication when optimizing cold paths. Second
> > > it should use estimate_num_insns because gimple statement count is not really
> > > very realistic estimate of final code size effect and third there seems to be
> > > no reason to disable the pass for functions optimized for size.
> > >
> > > If we block duplication for more than 1 insns for size optimized paths the pass
> > > is able to do majority of threading decisions that are for free and improve codegen.
> > > The code size benefit was between 0.5% to 2.7% on testcases I tried (tramp3d,
> > > GCC modules, xlanancbmk and some other stuff around my hd).
> > >
> > > Bootstrapped/regtested x86_64-linux, seems sane?
> > >
> > > The pass should also avoid calling cleanup_cfg when no trheading was done
> > > and i do not see why it is guarded by expensive_optimizations. What are the
> > > main compile time complexity limitations?
> >
> > This patch caused a huge regression (~11%) on coremarks on ThunderX.
> > I assume other targets too.
> > Basically it looks like the path is no longer thread jumped.
>
> Sorry for late reply. I checked our periodic testers and the patch seems more
> or less performance neutral with some code size improvements. Can you point
> me to the path that is no longer crossjumped? I added diag output, so you
> should see the reason why the path was considered unprofitable - either it
> was cold or we exceeded the maximal size. The size is largely untuned, so
> perhaps we can just adjust it.
Yes, it is going to be the "cold" edge heuristic that does it.
FSM jump-thread path not considered: duplication of 18 insns is needed and optimizing for size.
For whatever reason, in the case of the edges in this benchmark,
if I dump EDGE_FREQUENCY for the rejected edges in thread2 that we used to
accept they have frequency 0 (!).
This happens because the source block of the edge (the controlling switch)
has frequency 0:
;; basic block 7, loop depth 3, count 0, freq 0
;; prev block 6, next block 8, flags: (NEW, REACHABLE)
;; pred: 5 [95.0%] (FALSE_VALUE,EXECUTABLE)
switch (state_504) <default: <L31>, case 0: <L24>, case 2: <L25>, case 3: <L28>, case 4: <L26>, case 5: <L27>, case 6: <L29>, case 7: <L30>>
;; succ: 36 [12.5%] (EXECUTABLE)
;; 8 [12.5%] (EXECUTABLE)
;; 18 [12.5%] (EXECUTABLE)
;; 30 [12.5%] (EXECUTABLE)
;; 22 [12.5%] (EXECUTABLE)
;; 26 [12.5%] (EXECUTABLE)
;; 32 [12.5%] (EXECUTABLE)
;; 34 [12.5%] (EXECUTABLE)
It looks like it is thread1 that creates these blocks with empty
frequency information.
I haven't managed to find a reduced testcase that shows this issue.
Thanks,
James