This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH: New Optimization: Partitioning hot & cold basic blocks


> > + #define LONG_COND_BRANCH_SIZE 2000000
> 
>  (4) This is absolutely the wrong way to do whatever it is you're
>      trying to do with this.  I'm quite certain that decimal 2000000
>      does not correspond to any powerpc bit field limitation, and is
>      thus effectively chosen at random.

I'm pretty sure I advised Caroline to take this path, that is to try
to use the port's existing long-branch machinery when possible, by
basically claiming that a branch that goes across section boundaries
could go a very long way.  Doing so has clear benefits, which is that
you don't have to write new patterns to make the port support this
optimization, and you don't have to implement long conditional
branches twice.  The 2000000 is an artifact of this, and is the one
really annoying wart in that design; it'd be nice if it could be
avoided, and using UINT_MAX (or UINT_MAX/4 ?) sounds like a better
default than this particular magic constant, so long as it won't cause
overflow.

>       Given that only a scant few of our targets can actually
>       support long conditional branches, and almost all can only
>       support long unconditional branches, I'm going to require

This is a problem.  When I gave Caroline this advice, I thought that
most ports supported long conditional branches.  It turns out that
while many ports do (powerpc, mips, sparc, sh, and others), and some
don't need them (x86 and s390, I believe), some do need them and don't
have them implemented, and it's not easy to find out which ones (I
think that Alpha is one of these, but can't be sure).  So my proposal
is that this optimization should not be enabled by a -O option, and
each machine description can switch it on if it supports it.  This is
necessary in any case, since the port will need to have the
appropriate linker support to make this useful.

Note that the branch between .text and .text.cold is not a 'very long'
one as might be found when branching between shared objects.  I would
expect that .text.cold would appear right after the .text sections, so
this should be no worse than a conditional branch inside a very large
procedure, which motivated the original design suggestion.

> I think all of the ugliness you add to final.c wrt insn addreses
> can be removed if you require that targets implement special patterns
> to explicitly transition between sections.  Thus we don't have to
> come up with arbitrarily large numbers to satisfy the pc-dest test
> in normal branch patterns.

This is true, but I think it'd be better to make it easier to
implement targets, even if that makes it a bit harder to write
final.c, since there are lots of targets and only one final.c.  (Of
course, it's better yet if ugliness can be avoided completely.)  The
key here is that I thought that most ELF targets wouldn't need to do
anything special at all to use this optimisation.  As a practical
matter, if targets need to implement special patterns, then it will
take years before you can expect this optimisation to be available on
most targets.

One problem is that the number isn't exactly "arbitrarily large".  If
we knew this was the only object file in the program, we could compute
it exactly.  Unfortunately, we don't know exactly how far it's going
to be, so we have to overestimate.

> in that case,  adding the "!flag_exceptions" to the condition for
> calling my code, in toplev.c:

As a small improvement while we think about how to make this work for
routines that need an exception table, it would be nice if this could
test whether an exception table was actually going to be emitted.
Unfortunately, at the moment this condition is buried in dwarf2out.c:

      /* Don't emit EH unwind info for leaf functions that don't need it.  */
      if (for_eh && !flag_asynchronous_unwind_tables && flag_exceptions
          && (fde->nothrow || fde->all_throwers_are_sibcalls)
          && !fde->uses_eh_lsda)

I had a need for this same conditional elsewhere recently, also
involving an optimization that broke EH information output.  Perhaps
it should be coded into a public routine.

-- 
- Geoffrey Keating <geoffk@geoffk.org>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]