This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Slowdowns in code generated by GCC>=3.3

From: Steven Bosscher <stevenb at suse dot de>
To: Mike Stump <mrs at apple dot com>, Remko Troncon <remko dot troncon at cs dot kuleuven dot ac dot be>
Cc: gcc at gcc dot gnu dot org, rth at redhat dot com
Date: Tue, 26 Oct 2004 21:37:22 +0200
Subject: Re: Slowdowns in code generated by GCC>=3.3
Organization: SUSE Labs
References: <20041020123432.GA31922@cs.kuleuven.ac.be> <20041023124015.GA4154@cs.kuleuven.ac.be> <DA2D4B2D-2780-11D9-B104-000D9330C092@apple.com>

On Tuesday 26 October 2004 20:57, Mike Stump wrote:
> On Oct 23, 2004, at 5:40 AM, Remko Troncon wrote:
> >> Another thought, you can binary search the compiler sources from cvs,
> >> compiling your application at each instance to determine the patch
> >> that
> >> went in that regressed performance for you.
> >
> > I did a search through GCC CVS to find out which patch caused our
> > factor
> > 3 slowdown. Apparently, it is the patch with this ChangeLog entry:
> >
> > 2003-02-15  Richard Henderson  <rth@redhat.com>
> >         * bb-reorder.c (find_traces_1_round): Don't connect easy to
> > copy
> >         successors with multiple predecessors.
> >         (connect_traces): Try harder to copy traces of length 1.
> >         * function.h (struct function): Add computed_goto_common_label,
> >         computed_goto_common_reg.
> >         * function.c (free_after_compilation): Zap them.
> >         * stmt.c (expand_computed_goto): Use them to produce one
> >         indirect branch per function.
>
> Wait, we're not done yet, that is just the first step.  The next step
> is to find an instance of changed code generation...  It would help if
> you gperf or gcov your code, and then find a hot instance of the code
> that changed.  gcc -save-temps can be used to preserve the preprocessed
> source code and the assembly.
>
>  From there, if you can, trim down the extraneous code from the .i/.ii
> file and then submit that file as a bug report, with the flags, then
> generated .s before and after, and the net effect of the change (3x
> application slowdown) and the fact it is a regression and a pointer to
> the above changelog entry and the timings you get with and without
> that.

I already know exactly what the problem is.
In fact I mentioned it even before Remko confirmed that this patch is
causing his problem.

The patch makes us "factor" computed jumps.  From cfg.texi:

@smallexample
  goto *x;
  [ ... ]

  goto *x;
  [ ... ]

  goto *x;
  [ ... ]
@end smallexample

@noindent
factoring the computed jumps results in the following code sequence
which has a much simpler flow graph:

@smallexample
  goto y;
  [ ... ]

  goto y;
  [ ... ]

  goto y;
  [ ... ]

y:
  goto *x;
@end smallexample

Now, the problem is that gcse and crossjumping may move code in the
block with the factored computed jump, so when we later try to undo
the factoring, we think it is too expensive to do that.  We end up
with lots and lots of jumps to a single computed jump that is very
difficult to predict.
To make things even worse, the expressions gcse moves out increase
register pressure by just enough to make us spill too much on ix86.
For dProlog this caused a code size increase of ~60% for the (large)
function with the computed jump.

In rth's patch, bb-reorder is supposed to try harder to connect short
traces and duplicate them.  This apparently doesn't work, so Josef
Zlomek posted his patch that is linked to from the audit trail of
PR15242.
That patch fails, we clear current_function_has_computed_jump so we
never run the unfactoring.
Fixing that uncovers the problem with gcse and crossjumping, making
the unfactoring too expensive.
GCSE is a bigger problem (it's quite unlikely that crossjumping can
merge tails for typical code with computed jumps), so I suggest we
simply disable GCSE completely if there are computed jumps in some
function.  GCSE doesn't buy us much anyway for such code.

A patch I posted to Remko fixes his problem if he disables gcse and
crossjumping by hand.  I'm working on cleaning up that patch now so
that it is acceptable for mainline.  The most important problem is
current_function_has_computed_jump.  I plan to patch tree-cfg.c to
clean up computed jumps at the tree level, and let emit_indirect_jump
set current_function_has_computed_jump.  When it is set, we never
clear it after that.  This only affects sched-rgn in the unlikely
case that we'd clean up a computed jump at the RTL level.
The rest is just Josef's patch, revamped.

I hope to find time to finish the patch later this week.

Gr.
Steven

Follow-Ups:
- Re: Slowdowns in code generated by GCC>=3.3
  - From: Pablo Mejia

References:
- Slowdowns in code generated by GCC>=3.3
  - From: Remko Troncon
- Re: Slowdowns in code generated by GCC>=3.3
  - From: Remko Troncon
- Re: Slowdowns in code generated by GCC>=3.3
  - From: Mike Stump

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]