This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] PR26251: -Os vs. tree-ssa jump threading


Hi Jeff,

Thanks for your comments.

On Mon, 3 Jul 2006, Jeffrey Law wrote:
> It looks like a reasonable approach to me.

Cool!


> I think we already have some code which performs the equivalent of
> redirection_block_p.  You might peek at cfghooks and the tree CFG
> routines to see if the code we've already got will work for your
> needs.  Not absolutely sure on this, but the code you wrote looks
> awful familiar.

I did look though the CFG hooks, and all of the *_block_p functions
but alas nothing did exactly what what was needed.  In my initial
version this was called tree_forwarder_block_p [the exisiting
function forwarder_block_p applies only to RTL.  However, whilst
making the "forwarder" block predicate a CFG hook, I noticed our
definition of a forwarder block is that it have a single sucessor.
Hence I came up with the term redirection block, and added a comment
that redirection_block_p (bb) && EDGE_COUNT (bb->succs) == 1 can be
used to detected such forwarder blocks.

The reason the code looks familiar is that its based heavily on
tree-ssa-phiopt.c's empty_block_p fused with test from
remove_ctrl_stmt_and_useless_edges.  I even checked whether it
was possible to use is_ctrl_stmt and/or is_ctrl_altering_stmt to
implement this check.

One thing that made me think for a while was the skipping of
IS_EMPTY_STMT in this predicate.  It looks like tree_duplicate_bb
actually preserves empty statements (but drops LABEL_EXPRs)!  I'm
not sure if this is something that can be improved, or if anything
relies on this, but that probably isn't suitable stage3.  Anyway,
there seems no harm allowing blocks full on empty statements to
continue to be duplicated, as this'll get cleaned up eventually.



> The test you're using to determine if threading through a block will
> create duplicates is a little over-conservative.  More importantly,
> there's a simple and direct way to write that test.

Actually, I'm fairly certain this test is exact.  Grossly speaking
create_duplicates will call create_block_for_threading which calls
duplicate_block on the first edge it encounters in the hash table.
Given that we know thread is only ever called for blocks with atleast
one threaded edge (thanks to the threaded_block bitmap), we'd usually
always duplicate the block.  The one exception is the "all" case, where
for the first threaded edge, we can save a duplication, but if this
block has a more then edge, we'll duplicate the block for the second
third, etc.. edges [they must all need to be threaded because we set
the all flag].

Hence the only time we won't call duplicate_block is precisely the
case when EDGE_COUNT (bb->preds) == 1.

Not only is testing EDGE_COUNT early much more efficient, and
clearer, than testing local_info.template_block after we've allocated
a hash_table, inserted all of the relevant edges, and traversed it
calling calling create_duplicates.  For one thing, we'll actually
have physically duplicated one or more basic blocks, allocating their
stmts etc... before we decide that we've increased code size, and leave
it to cfgcleanup and the garbage collector to recover things.  But
more importantly, by this point we've commited to threading each
edge and have called update_bb_profile_for_threading, corrupting
the profile/count information should we decide not to commit changes.

I also expect that as tree-ssa-threadupdate improves we may be
able to avoid setting local_info.template_block more often.  The
original intention of the RTL jump bypassing pass was to update
the jump label in the predecessor block to the label of the
successor block, i.e. moving edges without introducing any new
basic blocks.  The same could be done in thread_block where for
redirection_block_p, we don't need to duplicate an empty block
if we can update an existing GOTO to point to an existing LABEL.
Currently, we don't even attempt such direct CFG manipulation
and instead rely on cfgcleanup and other passes to fix things up
for us.


> ps.  If you wanted to do a *really* good job, then you'd allow
> threading through blocks where all the statements in BB merely
> feed the control statement at the end of BB.

I agree that the decision to thread or not to thread could/should
be made smarter about tolerable code growth when not optimizing
for size.  However, this PR concerns just the -Os regression,
in which even duplicating executable statements that only feed the
conditional jump is a bad decision.  Hence this new test is near
ideal for optimize_size, and the inliner heuristics aren't directly
applicable.  When not optimizing for size, I agree issues such as
the size of the block being duplicated, and may be the profile
frequency become much more relevant.



Given that you're generally happy with the approach taken by my
patch and the counter arguments above, I'll go with the patch as
currently written and tested, and commit it to mainline later
today.

Thanks again for your feedback.

Roger
--


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]