This is the mail archive of the
mailing list for the GCC project.
Re: PATCH: New optimization: Partitioning hot/cold basic blocks
- From: Caroline Tice <ctice at apple dot com>
- To: gcc-patches at gcc dot gnu dot org
- Cc: Caroline Tice <ctice at apple dot com>
- Date: Fri, 27 Feb 2004 09:47:31 -0800
- Subject: Re: PATCH: New optimization: Partitioning hot/cold basic blocks
- References: <0C8212C5-66FE-11D8-B230-000393BB90B6@apple.com>
In running my tests on the i386 (under Linux), I discovered a few minor
bugs in my original patch which I have
now fixed. The new patch (below) has been completely tested on both an
Apple G4, running apple-darwin, and
an i386 running Linux. It has bootstrapped, passed DejaGnu, and
successfully completed all the SPECInt 2000
tests (using hot/cold partitioning) which the 3.5 compiler passes
Below is the corrected ChangeLog and patch. Is this okay to commit to
-- Caroline Tice
2004-02-27 Caroline Tice <firstname.lastname@example.org>
* basic-block.h (partition_hot_cold_basic_blocks): Add extern
* bb-reorder.c (Copyright): Add 2004 to the year list.
(function.h, obstack.h, expr.h, regs.h): Add four new include
(N_ROUNDS): Increase the maximum number of rounds by 1.
(branch_threshold): Add array value for new round.
(exec_threshold): Add array value for new round.
(find_rarely_executed_basic_blocks): New function.
(mark_bb_for_unlikely_executed_section): New function.
(color_basic_blocks): New function.
(find_all_crossing_edges): New function.
(add_labels_and_missing_jumps): New function.
(fix_up_fall_thru_edges): New function.
(find_jump_block): New function.
(fix_crossing_conditional_branches): New function.
(fix_crossing_unconditional_branches): New function.
(fix_edges_for_rarely_executed_code): New function.
(in_same_section): New function.
(partition_hot_cold_basic_blocks): New function.
(find_traces): Add an extra round for partitioning hot/cold
(find_traces_1_round): Modify to push all cold blocks (and only
cold blocks) into the last (extra) round of collecting traces.
(bb_to_key): Add code to correctly identify cold blocks when
doing partitioning (can't rely on block count at this point).
(connect_traces): Modify to connect all the non-cold traces
go back and connect up all the cold traces.
* cfgbuild.c (make_edges): Update
if we are doing hot/cold partitioning.
* cfgcleanup.c (cfglayout.h): Add new include statement.
(try_simplify_condjump): Modify to not attempt on blocks with
that cross section boundaries.
(try_crossjump_to_edge): Modify to not attempt after we have
the block partitioning.
(try_crossjump_bb): Modify to not attempt on blocks with jumps
cross section boundaries.
* cfghooks.c (tidy_fallthru_edges): Modify to not remove
jumps that cross section boundaries.
* cfglayout.c (flags.h): Add new include statement.
(update_unlikely_executed_notes): New function.
(fixup_reorder_chain): Add code so when a new jumping basic
added, it's UNLIKELY_EXECUTED_CODE note is updated
(duplicate_insn_chain): Add code to duplicate the new NOTE insn
introduced by this optimization.
* cfglayout.h (scan_ahead_for_unlikely_executed_note): Add new
extern function declaration.
* cfgrtl.c (can_delete_note_p): Add
list of notes that can be deleted.
(rtl_merge_blocks): Modify to not attempt on blocks with
jumps that cross section boundaries.
(rtl_can_merge_blocks_p): Modify to test blocks for jumps that
(try_redirect_by_replacing_jump): Modify to not attempt on
cross section boundaries.
(commit_one_edge_insertion): Add code so newly created basic
ends up in correct (hot or cold) section.
(cfg_layout_can_merge_blocks_p): Modify to test blocks for
cross section boundaries.
(cfg_layout_merge_blocks): Modify to not attempt on blocks that
(force_nonfallthru_and_redirect): Modify to make sure new basic
ends up in correct section.
(commit_one_edge_insertion): Modify to disallow insertions
* common.opt (freorder-blocks-and-partition): Add new flag for
* dbxout.c (dbx_function_end): Add code to make sure scope
the end of functions are written into the correct (hot or cold)
(dbx_source_file): Add code to so writing debug file information
doesn't incorrectly change sections.
* defaults.h (HOT_TEXT_SECTION_NAME): Modify value to work for
(UNLIKELY_EXECUTED_TEXT_SECTION_NAME): Modify value to work for
(SECTION_FORMAT_STRING): New global variable, for linux/i386
(HAS_LONG_COND_BRANCH): New global variable, indicating whether
conditional branches can span all of memory.
(HAS_LONG_UNCOND_BRANCH): New global variable, indicationg
whether or not
unconditional branches can span all of memory.
* final.c (scan_ahead_for_unlikely_executed_note): New
(is_jump_table_basic_block): New function.
(final_scan_insn): Add code to check for NOTE instruction
whether basic block belongs in hot or cold section, and to make
the current basic block is being written to the appropriate
Also added code to ensure that jump table basic blocks end up
* flags.h (flag_reorder_blocks_and_partition): New flag.
* ifcvt.c (find_if_case_1): Modify to not attempt if
one of the branches has a jump that crosses between sections.
* opts.c (decode_options): Code to handle new flag,
flag_reorder_blocks_and_partition; also to turn it off if
flag_exceptions is on.
(common_handle_option): Code to handle new flag,
* output.h (unlikely_text_section): New extern function
(in_unlikely_text_section): New extern function declaration.
* print-rtl.c (print_rtx): Add code for handling new note,
* rtl.c (NOTE_INSN_UNLIKELY_EXECUTED_CODE): New note (see
(REG_CROSSING_JUMP): New kind of reg_note, to mark jumps that
cross between section boundaries.
* rtl.h (NOTE_INSN_UNLIKELY_EXECUTED_CODE): New note
indicating the basic block containing it belongs in the cold
(REG_CROSSING_JUMP): New type of reg_note, to mark jumps that
between hot and cold sections.
* toplev.c (flag_reorder_blocks_and_partition): Add code to
initialize this flag, and to tie it to the command-line option
as an 'or' condition for calling reorder_basic_blocks.
flag_reorder_blocks_and_partition as an 'or' condition for
(rest_of_compilation): Add call to
* varasm.c (cfglayout.h): Add new include statement.
(unlikely_section_label_printed): New global variable, used for
determining when to output section name labels for cold
(in_section): Add in_unlikely_executed_text to enum data
(text_section): Modify code to use SECTION_FORMAT_STRING and
(unlikely_text_section): New function.
(in_unlikely_text_section): New function.
(function_section): Add code to make sure beginning of
written into correct section (hot or cold).
(assemble_start_function): Add code to make sure stuff is
the correct section.
(assemble_zeros): Add in_unlikely_text_section as an 'or'
to an if statement that was checking 'in_text_section'.
(assemble_variable): Add 'in_unlikely_text_section' as an 'or'
condition to an if statement that was checking
(default_section_type_flags_1): Add check: if in cold section
flags = SECTION_CODE.
* config/i386/i386.h (HAS_LONG_COND_BRANCH): Defined this macro
specifically for the i386.
(HAS_LONG_UNCOND_BRANCH): Defined this macro specifically for
* config/rs6000/darwin.h (UNLIKELY_EXECUTED_TEXT_SECTION_NAME):
text string to something more informative.
(SECTION_FORMAT_STRING): Add new definition.
* config/rs6000/rs6000.c (rs6000_assemble_integer): Add
'!in_unlikely_text_section' as an 'and' condition to an if
that was already checking '!in_text_section'.
* config/rs6000/sysv4.h (HOT_TEXT_SECTION_NAME,
sure these are properly defined for linux on ppc.
* doc/invoke.texi (freorder-blocks-and-partition): Add
for this new flag.
* doc/rtl.texi (REG_CROSSING_JUMP): Add documentation for new
* doc/tm.texi (SECTION_FORMAT_STRING, HAS_LONG_COND_BRANCH,
HAS_LONG_UNCOND_BRANCH): Add documentation for these new macros.
Description: Text document
On Feb 24, 2004, at 11:17 AM, Caroline Tice wrote:
Okay, here is the latest version of this patch. The basic idea behind
this optimization is to use the
feedback information to identify basic blocks as either "hot" or
"cold", and to put the cold basic blocks into
a different section of the executable than the hot blocks. This is
designed to improve paging and cache
locality. In order to implement this, I have introduced an new
INSN_NOTE, which identifies cold blocks,
and is added to each cold basic block. These notes are then used,
when writing out the assembly code,
to change between hot and cold sections as appropriate.
The real difficulty in implementing this optimization is that hot and
cold sections can be arbitrarily far apart in
the final executable, which means that, for some architectures,
conditional or unconditional branches may not
be able to span the distance between the sections. Therefore, in
addition to identifying hot/cold basic blocks
and writing them to separate sections, this optimization also needs to
fix up the CFG in the following manner:
1). Any edges that "fall through" from one section to another must be
converted to an explicit branch
instruction. 2). For those architectures where conditional branches
cannot reach all of memory, any
conditional branches that cross between sections need to be converted
to unconditional branches (i.e.
conditional branch to another block in same section, and the second
block contains an unconditional branch that
crosses between sections). 3). Finally, if the architecture does not
have unconditional branches that can span
all of memory, convert any unconditional branches that cross between
sections into indirect jumps.
The patch I have written does all of that. I have implemented two new
and HAVE_LONG_UNCOND_BRANCH, which are defined in defaults.h and which
can be redefined for
each target architecture, which are used to determine whether or not
steps 2 & 3 above need to be done or
The remaining tricky part is that converting unconditional branches
into indirect jumps requires using
an extra register, which means this transformation needs to be done
before register allocation has occurred.
So I put in the code that marks basic blocks and fixes up the edges
before register allocation. However there
are a lot of optimizations that keep trying to collapse the indirect
jump back into an unconditional branch, or
even back into the conditional branch. So I also had to find all of
those optimizations and make sure they
didn't attempt to do that to jumps that cross between sections. To
help with this, I have introduced a new
reg_note, REG_CROSSING_JUMP, which gets attached to branch
instructions created in steps 2 & 3 above.
Finally, I modifed the code slightly in the basic block reordering
optimization so that if we are doing the
partitioning optimization it tries to put all the hot blocks together
and all the cold blocks together.
Because this optimization does not currently work with exception
handling, I have added code to
decode_options to turn it off if flag_exceptions is on. This patch
does not do anything about fixing debug
information but I hope to do so in the future.
This patch has been bootstrapped on an Apple G4 running darwin, and
has passed all the SPECInt
tests that 3.5 passes; I'm in the process of running the DejaGnu tests
on the Apple, and also bootstrapping and
running the DejaGnu and Spec tests on an i386 running Linux. Assuming
it passes all the tests, is this
okay to commit to 3.5?
-- Caroline Tice