This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Ping: PATCH: New optimization, partitioning hot/cold basic blocks


Ping! Could some please review this patch. Thanks!

-- Caroline Tice
ctice@apple.com

On Mar 16, 2004, at 4:42 PM, Caroline Tice wrote:

Once again, here is the latest version of the optimization for partitioning hot and cold basic blocks into separate
sections of the .o and executable files. I have attempted to either answer or fix all the things Richard
Henderson mentioned (see below). The changelog and patch are at the end of this email.

I have tested this on both an Apple G4 running apple-darwin, and on an i386 running Linux. On both machines
it has:
- passed a small specific partitioning test I have
- passed all the SPEC tests (using partitioning) that 3.5 passes (without partitioning)

I am in the process of bootstrapping, and will then run the DejaGnu tests.

Assuming it passes the bootstraps and DejaGnu, is this okay to commit to the main 3.5 compiler?

-- Caroline Tice
ctice@apple.com



From: Richard Henderson <rth@redhat.com>
Date: March 3, 2004 1:50:45 PM PST
To: Caroline Tice <ctice@apple.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: PATCH: New optimization: Partitioning hot/cold basic blocks



Here are the comments you made that required written responses:


On Fri, Feb 27, 2004 at 09:47:31AM -0800, Caroline Tice wrote:

+ if (cond_jump)

Don't we already know that both edges already exist?


No, at this point we don't really know if it exists or not.


+ if (fall_thru && (fall_thru->dest->index >= 0))

rtl_verify_flow_info assures us that we never fallthru to exit.

Maybe it does later on, but at this point in the compiler there are fallthru edges
into the exit block. I know this because at first I took you at your word, and things
broke, and I discovered that you were incorrect here.


*************** try_crossjump_to_edge (int mode, edge e1
*** 1367,1372 ****
--- 1418,1430 ----
rtx newpos1, newpos2;
edge s;

+ /* If we have partitioned hot/cold basic blocks, it is a bad idea
+ to try this optimization. */
+
+ if (flag_reorder_blocks_and_partition && no_new_pseudos)
+ return false;

Hum? I don't see why not. You just want to avoid crossjumping
between hot/cold blocks.

This is not the optimization that cross jumps between basic blocks. In that case I do just
test to make sure we're not cross jumping between blocks. The case here is attempting to
crossjump "to an edge". The problem is that the basic blocks it finally decides to do the
optimization on are not easy to identify from the parameters coming in to the function. By
the time the blocks *are* identifiable, enough of the optimization has been performed that
it's hard to back out. So I thought it would be far less messy all the way around to just not
attempt crossjumping to an edge after the partitioning has occurred.

I'm also quite certain that you don't
want to insert NOTE_INSN_UNLIKELY_EXECUTED_CODE or REG_CROSSING_JUMP
until late in the compilation process. They'll just cause you
problems trying to keep them up-to-date.


NO KIDDING!! It's REALLY HARD for me to maintain this information because lots of cfg optimizations
keep wanting to undo my work. Unfortunately, because of the need to convert some things into
indirect jumps, I have to do the partitioning work before register allocation has occurred. Therefore I
really don't have a choice. I do the partitioning as late as I can (before register allocation), and add
the REG_CROSSING_JUMP note to all appropriate jumps at that point. I don't add the NOTE_INSN_UNLIKELY_EXECUTED_CODE note until we get to the bb-reordering code. (In case you
hadn't noticed, the partitioning now happens well before the actual feedback based reordering stuff.).


case NOTE_INSN_BASIC_BLOCK:
+
+ /* If we are performing the optimization that paritions
+ basic blocks into hot & cold sections of the .o file,
+ then at the start of each new basic block, before
+ beginning to write code for the basic block, we need to
+ check to see whether the basic block belongs in the hot
+ or cold section of the .o file, and change the section we
+ are writing to appropriately. */
+
+ if (flag_reorder_blocks_and_partition)
+ if (in_unlikely_text_section())
+ if (!(scan_ahead_for_unlikely_executed_note (insn)))
+ text_section ();

NOTE_INSN_BASIC_BLOCK occurs after the CODE_LABEL.


Yes I know. I don't understand that point of your comment here.


! if (!unlikely_section_label_printed)
! {
! fprintf (asm_out_file, "_%s_unlikely_section:\n",
! current_function_name ());
! unlikely_section_label_printed = true;
! }

This should be some debug callback, since dwarf2 is going to want
to handle this in some other way. And it would have to be two
leading underscores to get it out of the user's namespace.


The string above does not go into debug information at all. It is purely a label I write into
the assembly file, to help me find where the first cold section occurs for each function. I
have currently not added *any* debug info relating to the hot/cold partitioning. It is on my
"to do" list.


+ #define SECTION_FORMAT_STRING ".section\t\"%s\"\n\t.align 2\n"

What is this for?


To make sure the correct alignments, "dots", etc. end up being output for whichever architecture
we are on, when writing out the assembly directive for changing between hot/cold sections. Yes, this
*is* necessary. Without it I get bugs on both architectures I've tried this on.



Below are all the comments/suggestions you made that I have attempted to fix in
my code. I have tried to the very best of my ability to fix everything you mention below.


+ ret_val = (scan_ahead_for_unlikely_executed_note (BB_HEAD (e->src)) ==
+ scan_ahead_for_unlikely_executed_note (BB_HEAD (e->dest)));

The number of times you're scanning instructions for this note
is bad. You should just keep a bitmap or sbitmap, depending
on the expected density of the set bits.

{
+
...
+
+ }
+
+ }

Don't add silly whitespace. Lots of ocurrences.

! if (round < last_round - 1
+ && (round == last_round - 1)
+ && (round <= last_round - 1)

I think these tests are confusing. It would be much better to have
a predicate or nicely named boolean variable to accurately describe
what's being tested here.

Fold flag_reorder_blocks_and_partition into said predicates.

+ static
+ void color_basic_blocks (int *bb_colors, int *bb_has_label,

Bad formatting. Several ocurrences.

+ /* Find all the edges that cross between hot and cold sections. */
...
+ /* Check to see if cur_bb ends in an unconditional jump. If
+ so, there is nothing that needs to be done for it. */

Well that would seem to deny the function block comment.
What exactly is this function supposed to be collecting?

+ if ((GET_CODE (last_insn) == JUMP_INSN)
+ && (! any_condjump_p (last_insn)))
+ continue;

Redundant check form jump_insn.
Redundant parenthesis. Lots of ocurrences.

+ if ((GET_CODE (last_insn) != JUMP_INSN)
+ && (succ1->dest != cur_bb->rbi->next))
+ continue;

Another check for jump_insn?

I'm not sure what you're not just doing

FOR_EACH_BB(bb)
for (e = bb->succ; e ; e = e->succ_next)

Of course, that means you're not necessarily capped at 2*n_basic_blocks,
which means crossing_edges needs to be allocated and reallocated within
this function.

Alternately, I might guess that you could use e->aux to mark those that
cross sections, so you don't have to allocate memory at all. You'd have
to double-check that, since I don't know which aux bits are currently
used by bb-reorder. You might well be able to add your bits to those
datastructures, however.

A final alternative is to allocate a bit in e->flags.

+ /* If any destination of a crossing edge does not have a label, add label;
+ Convert any fall-through crossing edges (for blocks that do not contain
+ a jump) to unconditional jumps. */
+
+ static
+ void add_labels_and_missing_jumps (edge *crossing_edges,
+ int n_crossing_edges,
+ int *bb_has_label, int *bb_has_jump)

The creating label part of this is wrong -- we have block_label that
will create or find labels for blocks on demand. I see that there's
the no-jump fallthrough part remaining; perhaps this function becomes
clearer without the label bits.

+ if (dest && (dest->index != -2)) /* Not EXIT BLOCK */

Never ever compare vs -2/-1 directly.

+ /* Find the jump instruction. */
+
+ for (cur_insn = BB_HEAD (cur_bb);
+ cur_insn != NEXT_INSN (BB_END (cur_bb));
+ cur_insn = NEXT_INSN (cur_insn))
+ if (GET_CODE (cur_insn) == JUMP_INSN)

The jump insn is always at the end. Always.

+ /* Find label in fall_thru block. We've already added
+ any missing labels, so there must be one. */

block_label.

+ /* This is the case where both edges out of the basic
+ block are crossing edges. Here we will fix up the
+ fall through edge. The jump edge will be taken care
+ of later. */
+
+ new_bb = force_nonfallthru (fall_thru);

Seems like this new block should be reused for other crossings,
but I can't see how it would get found.

+ REG_NOTES (BB_END (new_bb)) = gen_rtx_EXPR_LIST
+ (REG_CROSSING_JUMP,
+ NULL_RTX,
+ NULL_RTX);

Don't all of these get added at the end?

+ static basic_block
+ find_jump_block (rtx old_label, basic_block *block_list, int max_idx)

Again this seems like something that we shouldn't be searching for.
Either we can process all crossing edges incomming to each block and
create just the one bounce block (and never have to record it) or we
should have this placed in some lookaside data structure (like RBI).

+ /* Find last, last hot, and last cold basic blocks. These will be
+ used as insertion points for new bb's containing unconditional
+ jumps (to cross section boundaries). */

Why wouldn't this be taken care of with the bb-reordering block
ordering data structures?

+ /* Find the jump insn in cur_bb. */
+
+ found = 0;
+ old_jump = NULL;
+ for (cur_insn = BB_HEAD (cur_bb);
+ (cur_insn != NEXT_INSN (BB_END (cur_bb))) && !found;
+ cur_insn = NEXT_INSN (cur_insn))

Again, it's last.

+ /* Check to make sure the jump instruction is a
+ conditional jump. */

We have predicates in jump.c for this sort of thing.

+ /* Check to see if bb ends in an unconditional jump. */
+
+ if ((GET_CODE (last_insn) == JUMP_INSN)
+ && (! any_condjump_p (last_insn)))
+ {
+ succ = cur_bb->succ;
+
+ if (succ->succ_next)
+ {
+ rtx set_src;
+ if (GET_CODE (PATTERN (last_insn)) == SET)
+ set_src = SET_SRC (PATTERN (last_insn));
+ else if (GET_CODE (PATTERN (last_insn)) == PARALLEL)
+ {
+ set_src = XVECEXP (PATTERN (last_insn), 0, 0);
+ if (GET_CODE (set_src) == SET)
+ set_src = SET_SRC (set_src);
+ else
+ set_src = NULL_RTX;
+
+ if (set_src && (GET_CODE (set_src) == REG))
+ continue;
+ else
+ abort ();
+ }
+ else
+ abort ();
+ }
+

What kind of unconditional direct jump has two edges? Are you
looking to exclude tablejump and computed jump here or what?
Again, there's predicates for this sort of thing in jump.c.

*************** try_simplify_condjump (basic_block cbran
*** 148,153 ****
--- 149,163 ----
return false;
jump_dest_block = jump_block->succ->dest;

+ /* If we are partitioning hot/cold basic blocks, we don't want to
+ mess up unconditional or indirect jumps that cross between hot
+ and cold sections. */
+
+ if (flag_reorder_blocks_and_partition
+ && (scan_ahead_for_unlikely_executed_note (BB_HEAD (jump_block)) !=
+ scan_ahead_for_unlikely_executed_note (BB_HEAD (jump_dest_block))))
+ return false;

Ok, you use this data outside your own pass. This *definitely*
wants some sort of basic_block/edge annotation.


Also, you'll want to have verify_flow_info changes that validate
things like fallthru only between same sections.

+ if (current_function_decl->decl.section_name)
+ fprintf (asmfile, SECTION_FORMAT_STRING,
+ TREE_STRING_POINTER (current_function_decl->decl.section_name));
+ else
+ fprintf (asmfile, SECTION_FORMAT_STRING, HOT_TEXT_SECTION_NAME);

Hmm? Are you absolutely certain you don't want function_section()?

#ifndef HOT_TEXT_SECTION_NAME
- #define HOT_TEXT_SECTION_NAME "text.hot"
+ #define HOT_TEXT_SECTION_NAME ".text"
#endif

This conflicts with existing usage of HOT_TEXT_SECTION_NAME.

#ifndef UNLIKELY_EXECUTED_TEXT_SECTION_NAME
- #define UNLIKELY_EXECUTED_TEXT_SECTION_NAME "text.unlikely"
+ #define UNLIKELY_EXECUTED_TEXT_SECTION_NAME ".textunlikely"
#endif

Why?

+ static bool
+ is_jump_table_basic_block (rtx insn)

You mean something like tablejump_p?

+ if (!in_unlikely_text_section())
+ unlikely_text_section ();

Section changing should already take care of the if.




2004-03-16 Caroline Tice <ctice@apple.com>

* basic-block.h (struct edge_def): Add new field, crossing_edge.
(struct basic_block_def): Add new field, partition.
(UNPARTITIONED, HOT_PARTITION, COLD_PARTITION): New constant macro
definitions.
(partition_hot_cold_basic_blocks): Add extern function
declaration.
* bb-reorder.c (function.h, obstack.h, expr.h, regs.h): Add four new
include statements.
(N_ROUNDS): Increase the maximum number of rounds by 1.
(branch_threshold): Add array value for new round.
(exec_threshold): Add array value for new round.
(push_to_next_round_p): New function.
(add_unlikely_executed_notes): New function.
(find_rarely_executed_basic_blocks_and_crossing_edges): New function.
(mark_bb_for_unlikely_executed_section): New function.
(add_labels_and_missing_jumps): New function.
(add_reg_crossing_jump_notes): New function.
(fix_up_fall_thru_edges): New function.
(find_jump_block): New function.
(fix_crossing_conditional_branches): New function.
(fix_crossing_unconditional_branches): New function.
(fix_edges_for_rarely_executed_code): New function.
(partition_hot_cold_basic_blocks): New function.
(find_traces): Add an extra round for partitioning hot/cold
basic blocks.
(find_traces_1_round): Add a parameter. Modify to push all cold blocks,
and only cold blocks, into the last (extra) round of collecting traces.
(better_edge_p): Add a parameter. Modify to favor non-crossing edges
over crossing edges.
(bb_to_key): Add code to correctly identify cold blocks when
doing partitioning.
(connect_traces): Modify to connect all the non-cold traces first, then
go back and connect up all the cold traces.
(reorder_basic_blocks): Add call to add_unlikely_executed_notes.
* cfg.c (entry_exit_blocks): Add initialization for partition field in
entry and exit blocks.
* cfgbuild.c (make_edges): Update current_function_has_computed_jump
if we are doing hot/cold partitioning.
* cfgcleanup.c (cfglayout.h): Add new include statement.
(try_simplify_condjump): Modify to not attempt on blocks with jumps
that cross section boundaries.
(try_forward_edges): Likewise.
(merge_blocks_move_predecessor_nojumps): Likewise.
(merge_blocks_move_successor_nojumps): Likewise.
(merge_blocks_move): Likewise.
(try_crossjump_to_edge): Modify to not attempt after we have done
the block partitioning.
(try_crossjump_bb): Modify to not attempt on blocks with jumps that
cross section boundaries.
(try_optimize_cfg): Likewise.
* cfghooks.c (tidy_fallthru_edges): Modify to not remove indirect
jumps that cross section boundaries.
* cfglayout.c (flags.h): Add new include statement.
(update_unlikely_executed_notes): New function.
(fixup_reorder_chain): Add code so when a new jumping basic block is
added, it's UNLIKELY_EXECUTED_CODE and REG_CROSSING_JUMP notes are
updated appropriately.
(duplicate_insn_chain): Add code to duplicate the new NOTE insn
introduced by this optimization.
* cfglayout.h (scan_ahead_for_unlikely_executed_note): Add new
extern function declaration.
* cfgrtl.c (can_delete_note_p): Add NOTE_INSN_UNLIKELY_EXECUTED_CODE to
list of notes that can be deleted.
(create_basic_block_structure): Add initialization for partition field.
(rtl_can_merge_blocks): Modify to test blocks for jumps that cross
section boundaries.
(try_redirect_by_replacing_jump): Modify to not attempt on jumps that
cross section boundaries.
(commit_one_edge_insertion): Add code so newly created basic block
ends up in correct (hot or cold) section. Modify to disallow
insertions before NOTE_INSN_UNLIKELY_EXECUTED_CODE notes.
(rtl_verify_flow_info_1): Add code to verify that no fall_thru edge
crosses section boundaries.
(cfg_layout_can_merge_blocks_p): Modify to test blocks for jumps that
cross section boundaries.
(force_nonfallthru_and_redirect): Modify to make sure new basic block
ends up in correct section, with correct notes attached.
* common.opt (freorder-blocks-and-partition): Add new flag for this
optimization.
* dbxout.c (dbx_function_end): Add code to make sure scope labels at
the end of functions are written into the correct (hot or cold)
section.
(dbx_source_file): Add code so writing debug file information
doesn't incorrectly change sections.
* defaults.h (NORMAL_TEXT_SECTION_NAME): New constant macro, for use
in partitioning hot/cold basic blocks into separate sections.
(SECTION_FORMAT_STRING): New constant macro, for linux/i386 hot/cold
section partitioning.
(HAS_LONG_COND_BRANCH): New constant macro, indicating whether or not
conditional branches can span all of memory.
(HAS_LONG_UNCOND_BRANCH): New constant macro, indicationg whether or not
unconditional branches can span all of memory.
* final.c (scan_ahead_for_unlikely_executed_note): New function.
(final_scan_insn): Add code to check for NOTE instruction indicating
whether basic block belongs in hot or cold section, and to make sure
the current basic block is being written to the appropriate section.
Also added code to ensure that jump table basic blocks end up in the
correct section.
* flags.h (flag_reorder_blocks_and_partition): New flag.
* ifcvt.c (find_if_case_1): Modify to not attempt if conversion if
one of the branches has a jump that crosses between sections.
(find_if_case_2): Likewise.
(ifcvt): Modify to not attempt to mark loop exit edges after
hot/cold partitioning has occurred.
* opts.c (decode_options): Code to handle new flag,
flag_reorder_blocks_and_partition; also to turn it off if
flag_exceptions is on.
(common_handle_option): Code to handle new flag,
flag_reorder_blocks_and_partition.
* output.h (unlikely_text_section): New extern function declaration.
(in_unlikely_text_section): New extern function declaration.
* passes.c (rest_of_handle_stack_regs): Add
flag_reorder_blocks_and_partition as an 'or' condition for calling
reorder_basic_blocks.
(rest_of_handle_reorder_blocks): Add flag_reorder_blocks_and_partition
as an 'or' condition for calling reorder_basic_blocks.
(rest_of_compilation): Add call to partition_hot_cold_basic_blocks.
* print-rtl.c (print_rtx): Add code for handling new note,
NOTE_INSN_UNLIKELY_EXECUTED_CODE
* rtl.c (NOTE_INSN_UNLIKELY_EXECUTED_CODE): New note insn (see below).
(REG_CROSSING_JUMP): New kind of reg_note, to mark jumps that
cross between section boundaries.
* rtl.h (NOTE_INSN_UNLIKELY_EXECUTED_CODE): New note instruction,
indicating the basic block containing it belongs in the cold section.
(REG_CROSSING_JUMP): New type of reg_note, to mark jumps that cross
between hot and cold sections.
* toplev.c (flag_reorder_blocks_and_partition): Add code to
initialize this flag, and to tie it to the command-line option
freorder-blocks-and-partition.
* varasm.c (cfglayout.h): Add new include statement.
(unlikely_section_label_printed): New global variable, used for
determining when to output section name labels for cold sections.
(in_section): Add in_unlikely_executed_text to enum data structure.
(text_section): Modify code to use SECTION_FORMAT_STRING and
NORMAL_TEXT_SECTION_NAME macros.
(unlikely_text_section): New function.
(in_unlikely_text_section): New function.
(function_section): Add code to make sure beginning of function is
written into correct section (hot or cold).
(assemble_start_function): Add code to make sure stuff is written to
the correct section.
(assemble_zeros): Add in_unlikely_text_section as an 'or' condition
to an if statement that was checking 'in_text_section'.
(assemble_variable): Add 'in_unlikely_text_section' as an 'or'
condition to an if statement that was checking 'in_text_section'.
(default_section_type_flags_1): Add check: if in cold section
flags = SECTION_CODE.
* config/darwin.c (darwin_asm_named_section): Modify to use
SECTION_FORMAT_STRING if we are partitioning hot/cold blocks.
* config/i386/i386.h (HAS_LONG_COND_BRANCH): Defined this macro
specifically for the i386.
(HAS_LONG_UNCOND_BRANCH): Defined this macro specifically for the i386.
* config/rs6000/darwin.h (UNLIKELY_EXECUTED_TEXT_SECTION_NAME): Change
text string to something more informative.
(NORMAL_TEXT_SECTION_NAME): Add new definition.
(SECTION_FORMAT_STRING): Add new definition.
* config/rs6000/rs6000.c (rs6000_assemble_integer): Add
'!in_unlikely_text_section' as an 'and' condition to an if statement
that was already checking '!in_text_section'.
* config/rs6000/sysv4.h (HOT_TEXT_SECTION_NAME,NORMAL_TEXT_SECTION_NAME,
UNLIKELY_EXECUTED_TEXT_SECTION_NAME,SECTION_FORMAT_STRING): Make
sure these are properly defined for linux on ppc.
* doc/invoke.texi (freorder-blocks-and-partition): Add documentation
for this new flag.
* doc/rtl.texi (REG_CROSSING_JUMP): Add documentation for new
reg_note.
* doc/tm.texi (NORMAL_TEXT_SECTION_NAME, SECTION_FORMAT_STRING,
HAS_LONG_COND_BRANCH, HAS_LONG_UNCOND_BRANCH): Add documentation for
these new macros.


<gcc5-hot-cold4c.txt>

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]