This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: PATCH: New optimization: Partitioning hot/cold basic blocks
- From: Caroline Tice <ctice at apple dot com>
- To: gcc-patches at gcc dot gnu dot org
- Cc: Caroline Tice <ctice at apple dot com>
- Date: Fri, 27 Feb 2004 09:47:31 -0800
- Subject: Re: PATCH: New optimization: Partitioning hot/cold basic blocks
- References: <0C8212C5-66FE-11D8-B230-000393BB90B6@apple.com>
In running my tests on the i386 (under Linux), I discovered a few minor
bugs in my original patch which I have
now fixed. The new patch (below) has been completely tested on both an
Apple G4, running apple-darwin, and
an i386 running Linux. It has bootstrapped, passed DejaGnu, and
successfully completed all the SPECInt 2000
tests (using hot/cold partitioning) which the 3.5 compiler passes
(without partitioning).
Below is the corrected ChangeLog and patch. Is this okay to commit to
3.5?
-- Caroline Tice
ctice@apple.com
2004-02-27 Caroline Tice <ctice@apple.com>
* basic-block.h (partition_hot_cold_basic_blocks): Add extern
function
declaration.
* bb-reorder.c (Copyright): Add 2004 to the year list.
(function.h, obstack.h, expr.h, regs.h): Add four new include
statements.
(N_ROUNDS): Increase the maximum number of rounds by 1.
(branch_threshold): Add array value for new round.
(exec_threshold): Add array value for new round.
(find_rarely_executed_basic_blocks): New function.
(mark_bb_for_unlikely_executed_section): New function.
(color_basic_blocks): New function.
(find_all_crossing_edges): New function.
(add_labels_and_missing_jumps): New function.
(fix_up_fall_thru_edges): New function.
(find_jump_block): New function.
(fix_crossing_conditional_branches): New function.
(fix_crossing_unconditional_branches): New function.
(fix_edges_for_rarely_executed_code): New function.
(in_same_section): New function.
(partition_hot_cold_basic_blocks): New function.
(find_traces): Add an extra round for partitioning hot/cold
basic blocks.
(find_traces_1_round): Modify to push all cold blocks (and only
cold blocks) into the last (extra) round of collecting traces.
(bb_to_key): Add code to correctly identify cold blocks when
doing partitioning (can't rely on block count at this point).
(connect_traces): Modify to connect all the non-cold traces
first, then
go back and connect up all the cold traces.
* cfgbuild.c (make_edges): Update
current_function_has_computed_jump
if we are doing hot/cold partitioning.
* cfgcleanup.c (cfglayout.h): Add new include statement.
(try_simplify_condjump): Modify to not attempt on blocks with
jumps
that cross section boundaries.
(try_forward_edges): Likewise.
(merge_blocks_move_predecessor_nojumps): Likewise.
(merge_blocks_move_successor_nojumps): Likewise.
(merge_blocks_move): Likewise.
(try_crossjump_to_edge): Modify to not attempt after we have
done
the block partitioning.
(try_crossjump_bb): Modify to not attempt on blocks with jumps
that
cross section boundaries.
(try_optimize_cfg): Likewise.
* cfghooks.c (tidy_fallthru_edges): Modify to not remove
indirect
jumps that cross section boundaries.
* cfglayout.c (flags.h): Add new include statement.
(update_unlikely_executed_notes): New function.
(fixup_reorder_chain): Add code so when a new jumping basic
block is
added, it's UNLIKELY_EXECUTED_CODE note is updated
appropriately.
(duplicate_insn_chain): Add code to duplicate the new NOTE insn
introduced by this optimization.
* cfglayout.h (scan_ahead_for_unlikely_executed_note): Add new
extern function declaration.
* cfgrtl.c (can_delete_note_p): Add
NOTE_INSN_UNLIKELY_EXECUTED_CODE to
list of notes that can be deleted.
(rtl_merge_blocks): Modify to not attempt on blocks with
jumps that cross section boundaries.
(rtl_can_merge_blocks_p): Modify to test blocks for jumps that
cross
section boundaries.
(try_redirect_by_replacing_jump): Modify to not attempt on
jumps that
cross section boundaries.
(commit_one_edge_insertion): Add code so newly created basic
block
ends up in correct (hot or cold) section.
(cfg_layout_can_merge_blocks_p): Modify to test blocks for
jumps that
cross section boundaries.
(cfg_layout_merge_blocks): Modify to not attempt on blocks that
cross
seciton boundaries.
(force_nonfallthru_and_redirect): Modify to make sure new basic
block
ends up in correct section.
(commit_one_edge_insertion): Modify to disallow insertions
before
NOTE_INSN_UNLIKELY_EXECUTED_CODE notes.
* common.opt (freorder-blocks-and-partition): Add new flag for
this
optimization.
* dbxout.c (dbx_function_end): Add code to make sure scope
labels at
the end of functions are written into the correct (hot or cold)
section.
(dbx_source_file): Add code to so writing debug file information
doesn't incorrectly change sections.
* defaults.h (HOT_TEXT_SECTION_NAME): Modify value to work for
linux/i386.
(UNLIKELY_EXECUTED_TEXT_SECTION_NAME): Modify value to work for
linux/i386.
(SECTION_FORMAT_STRING): New global variable, for linux/i386
hot/cold
section partitioning.
(HAS_LONG_COND_BRANCH): New global variable, indicating whether
or not
conditional branches can span all of memory.
(HAS_LONG_UNCOND_BRANCH): New global variable, indicationg
whether or not
unconditional branches can span all of memory.
* final.c (scan_ahead_for_unlikely_executed_note): New
function.
(is_jump_table_basic_block): New function.
(final_scan_insn): Add code to check for NOTE instruction
indicating
whether basic block belongs in hot or cold section, and to make
sure
the current basic block is being written to the appropriate
section.
Also added code to ensure that jump table basic blocks end up
in the
correct section.
* flags.h (flag_reorder_blocks_and_partition): New flag.
* ifcvt.c (find_if_case_1): Modify to not attempt if
conversion if
one of the branches has a jump that crosses between sections.
(find_if_case_2): Likewise.
* opts.c (decode_options): Code to handle new flag,
flag_reorder_blocks_and_partition; also to turn it off if
flag_exceptions is on.
(common_handle_option): Code to handle new flag,
flag_reorder_blocks_and_partition.
* output.h (unlikely_text_section): New extern function
declaration.
(in_unlikely_text_section): New extern function declaration.
* print-rtl.c (print_rtx): Add code for handling new note,
NOTE_INSN_UNLIKELY_EXECUTED_CODE
* rtl.c (NOTE_INSN_UNLIKELY_EXECUTED_CODE): New note (see
below).
(REG_CROSSING_JUMP): New kind of reg_note, to mark jumps that
cross between section boundaries.
* rtl.h (NOTE_INSN_UNLIKELY_EXECUTED_CODE): New note
instruction,
indicating the basic block containing it belongs in the cold
section.
(REG_CROSSING_JUMP): New type of reg_note, to mark jumps that
cross
between hot and cold sections.
* toplev.c (flag_reorder_blocks_and_partition): Add code to
initialize this flag, and to tie it to the command-line option
freorder-blocks-and-partition.
(rest_of_handle_stack_regs): Add
flag_reorder_blocks_and_partition
as an 'or' condition for calling reorder_basic_blocks.
(rest_of_handle_reorder_blocks): Add
flag_reorder_blocks_and_partition as an 'or' condition for
calling
reorder_basic_blocks.
(rest_of_compilation): Add call to
partition_hot_cold_basic_blocks.
* varasm.c (cfglayout.h): Add new include statement.
(unlikely_section_label_printed): New global variable, used for
determining when to output section name labels for cold
sections.
(in_section): Add in_unlikely_executed_text to enum data
structure.
(text_section): Modify code to use SECTION_FORMAT_STRING and
HOT_TEXT_SECTION_NAME macros.
(unlikely_text_section): New function.
(in_unlikely_text_section): New function.
(function_section): Add code to make sure beginning of
function is
written into correct section (hot or cold).
(assemble_start_function): Add code to make sure stuff is
written to
the correct section.
(assemble_zeros): Add in_unlikely_text_section as an 'or'
condition
to an if statement that was checking 'in_text_section'.
(assemble_variable): Add 'in_unlikely_text_section' as an 'or'
condition to an if statement that was checking
'in_text_section'.
(default_section_type_flags_1): Add check: if in cold section
flags = SECTION_CODE.
* config/i386/i386.h (HAS_LONG_COND_BRANCH): Defined this macro
specifically for the i386.
(HAS_LONG_UNCOND_BRANCH): Defined this macro specifically for
the i386.
* config/rs6000/darwin.h (UNLIKELY_EXECUTED_TEXT_SECTION_NAME):
Change
text string to something more informative.
(SECTION_FORMAT_STRING): Add new definition.
* config/rs6000/rs6000.c (rs6000_assemble_integer): Add
'!in_unlikely_text_section' as an 'and' condition to an if
statement
that was already checking '!in_text_section'.
* config/rs6000/sysv4.h (HOT_TEXT_SECTION_NAME,
UNLIKELY_EXECUTED_TEXT_SECTION_NAME,SECTION_FORMAT_STRING): Make
sure these are properly defined for linux on ppc.
* doc/invoke.texi (freorder-blocks-and-partition): Add
documentation
for this new flag.
* doc/rtl.texi (REG_CROSSING_JUMP): Add documentation for new
reg_note.
* doc/tm.texi (SECTION_FORMAT_STRING, HAS_LONG_COND_BRANCH,
HAS_LONG_UNCOND_BRANCH): Add documentation for these new macros.
Attachment:
gcc5-hot-cold3d.txt
Description: Text document
On Feb 24, 2004, at 11:17 AM, Caroline Tice wrote:
Okay, here is the latest version of this patch. The basic idea behind
this optimization is to use the
feedback information to identify basic blocks as either "hot" or
"cold", and to put the cold basic blocks into
a different section of the executable than the hot blocks. This is
designed to improve paging and cache
locality. In order to implement this, I have introduced an new
INSN_NOTE, which identifies cold blocks,
and is added to each cold basic block. These notes are then used,
when writing out the assembly code,
to change between hot and cold sections as appropriate.
The real difficulty in implementing this optimization is that hot and
cold sections can be arbitrarily far apart in
the final executable, which means that, for some architectures,
conditional or unconditional branches may not
be able to span the distance between the sections. Therefore, in
addition to identifying hot/cold basic blocks
and writing them to separate sections, this optimization also needs to
fix up the CFG in the following manner:
1). Any edges that "fall through" from one section to another must be
converted to an explicit branch
instruction. 2). For those architectures where conditional branches
cannot reach all of memory, any
conditional branches that cross between sections need to be converted
to unconditional branches (i.e.
conditional branch to another block in same section, and the second
block contains an unconditional branch that
crosses between sections). 3). Finally, if the architecture does not
have unconditional branches that can span
all of memory, convert any unconditional branches that cross between
sections into indirect jumps.
The patch I have written does all of that. I have implemented two new
macros, HAVE_LONG_COND_BRANCH
and HAVE_LONG_UNCOND_BRANCH, which are defined in defaults.h and which
can be redefined for
each target architecture, which are used to determine whether or not
steps 2 & 3 above need to be done or
not.
The remaining tricky part is that converting unconditional branches
into indirect jumps requires using
an extra register, which means this transformation needs to be done
before register allocation has occurred.
So I put in the code that marks basic blocks and fixes up the edges
before register allocation. However there
are a lot of optimizations that keep trying to collapse the indirect
jump back into an unconditional branch, or
even back into the conditional branch. So I also had to find all of
those optimizations and make sure they
didn't attempt to do that to jumps that cross between sections. To
help with this, I have introduced a new
reg_note, REG_CROSSING_JUMP, which gets attached to branch
instructions created in steps 2 & 3 above.
Finally, I modifed the code slightly in the basic block reordering
optimization so that if we are doing the
partitioning optimization it tries to put all the hot blocks together
and all the cold blocks together.
Because this optimization does not currently work with exception
handling, I have added code to
decode_options to turn it off if flag_exceptions is on. This patch
does not do anything about fixing debug
information but I hope to do so in the future.
This patch has been bootstrapped on an Apple G4 running darwin, and
has passed all the SPECInt
tests that 3.5 passes; I'm in the process of running the DejaGnu tests
on the Apple, and also bootstrapping and
running the DejaGnu and Spec tests on an i386 running Linux. Assuming
it passes all the tests, is this
okay to commit to 3.5?
-- Caroline Tice
ctice@apple.com