This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] bb-reorder: Improve the simple algorithm for -Os (PR67864)
- From: Segher Boessenkool <segher at kernel dot crashing dot org>
- To: Bernd Schmidt <bschmidt at redhat dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Fri, 9 Oct 2015 11:27:36 -0500
- Subject: Re: [PATCH] bb-reorder: Improve the simple algorithm for -Os (PR67864)
- Authentication-results: sourceware.org; auth=none
- References: <c792164a9e6f8c4ba4db346c2df0e2341c73a3df dot 1444322573 dot git dot segher at kernel dot crashing dot org> <56179882 dot 6050501 at redhat dot com>
On Fri, Oct 09, 2015 at 12:35:46PM +0200, Bernd Schmidt wrote:
> On 10/08/2015 06:57 PM, Segher Boessenkool wrote:
> >As the PR points out, the "simple" reorder algorithm makes bigger code
> >than the STC algorithm did, for -Os, for x86. I now tested it for many
> >different targets and it turns out to be worse everywhere.
>
> That's somewhat disappointing. Wasn't it supposed to improve over it?
> What went wrong?
I first somehow tested -O2 only. Then when you asked I tested -Os again,
but only on powerpc64, and it was 0.1% there. It seems combine.ii is not
very representative for this :-/
> >Is this okay for trunk?
> >
> >2015-10-08 Segher Boessenkool <segher@kernel.crashing.org>
> >
> > PR rtl-optimization/67864
> > * gcc/bb-reorder (reorder_basic_blocks_simple): Prefer existing
> > fallthrough edges for conditional jumps. Don't sort candidate
> > edges if not optimizing for speed.
>
> Ok. Although it would be nice to understand exactly what causes code
> growth and possibly get a real cost estimate rather than such a heuristic.
There are two main factors: firstly, you want to have as many fallthroughs
as possible (any block that does not end in a fallthrough gets an extra
jump insn). I haven't found any purely local heuristic that works better
than this patch. Very likely you can do better by looking at bigger parts
of the graph (say, identify loops and diamonds).
The second thing (which still hurts x86) is that on some targets the "cheap"
conditional branches have a very short range. "Simple" has no notion of
jump distance at all.
Segher