This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Shrink struct basic_block, get rid of NOTE_INSN_DISABLE_SCHED_OF_BLOCK


Caroline Tice <ctice@apple.com> writes:

>> It isn't that the jump is unconditional, but that the jump is the
>> only edge leaving A and the only edge entering B.  (Being
>> unconditional is a consequence of that.)  If basic blocks A and B
>> are candidates for merging, they will be executed exactly the same
>> number of times in all possible runs of the program, and therefore
>> it is nonsensical for them to be in different partitions.
>
> I think perhaps you don't understand how this situation is arising
> with the hot/cold partitioning optimization.
>
> In order to deal with architectures that have short conditional
> branches (which cannot span all of memory) I have to take any
> conditional jump that attempts to cross a section boundary and add a
> level of indirection: it becomes a conditional jump to a new basic
> block, in the same section.  The new basic block contains an
> unconditional jump to the original target, in the other section.  That
> is how you can get these occurrences, which should not be merged
> together, as the source and target block really do belong in distinct
> sections.  I think that answers the first part of your question,
> unless I misunderstand what you are saying.
>
> Now, for those architectures whose unconditional branch is also
> incapable of reaching all of memory, I then convert those
> unconditional jumps into indirect jumps, through a register.  This is
> what causes a lot of the mess you were complaining about in your other
> email.  Originally I had set up the hot/cold partitioning to happen as
> part of the basic-block reordering phase, when the appropriate flag
> was set.  Unfortunately (as you also noted) this happens quite late in
> the compilation; too late for me to grab the registers I need for
> creating the indirect jumps.  Therefore the only solution I could come
> up with was to move the phase that decides on the partitioning and
> fixes up the branches into an earlier part of the compiler where I
> could still grab the necessary registers.  But then, as you noted, I
> have to prevent many cfg cleanup optimizations from undoing my fix ups
> for branches that cross section boundaries.  I hope this clarifies
> things for you a little.

I understand the problem now.

My inclination would be to dump both these problems on the back end.
In the partitioner, don't generate any special code for the section-
crossing branches.  In final.c, detect JUMP_INSNs that are in a
different section from their destination, and tag them somehow
(REG_CROSSING_JUMP would work, but I'd prefer a solution that didn't
involve allocating memory, like one of the RTL flags).  The back end
is then put on notice that a JUMP_INSN that is tagged in the way we
decide on, may exceed the range of the normal conditional-branch
instruction, and it has to do *something* to make that work.

The code sequence that the back end generates does not have to be
efficient, because the whole point of this optimization is to put the
rarely-executed code out of the way so it's less likely to take up
I-cache.  That means, when we do have to execute that code, any extra
cost to getting there is liable to be dwarfed by cache misses.  On
architectures with long-distance unconditional branches, the back end
can generate the

    j<revcc> 1f
    jmp  target
1:

sequence that you're having the MI optimizer generate now, but without
confusing the control-flow graph.  On architectures that need a
scratch register, there's almost always something clever that can be
done.  In extremis, I suppose the equivalent of

        move    reg, -(sp)
        move    .Ldest, reg
        jmp     (reg)
...
.Ldest:
        move    (sp)+, reg

must always work, except that now we have to put code at the
destination, which is something that I'm not sure how to pull off.

A nice thing about dumping the problem on the back end is, the back
end can in turn dump the problem on the linker, using a special
hey-this-branch-may-be-out-of-range-please-fix-it relocation.  The
linker knows which of the branches wind up actually being out of
range, and can fix just those up.  (The usual technique is to insert
little blocks of forwarding jumps in between functions.)

I am not asking you to do any of this now.  However, I would suggest
that you add a comment mentioning the problem at *every* site where
CFG optimizations are disabled because of this.  It doesn't have to be
long - just say something like 

  /* Basic block partitioning may create forwarder blocks that appear
     mergeable with other blocks, but aren't.  See the comment at the
     top of bb-reorder.c:<function> for the gory details.  */

and then do explain all the gory details in the place referenced.

zw


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]