This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH: New optimization: Partitioning hot/cold basic blocks



In running my tests on the i386 (under Linux), I discovered a few minor bugs in my original patch which I have
now fixed. The new patch (below) has been completely tested on both an Apple G4, running apple-darwin, and
an i386 running Linux. It has bootstrapped, passed DejaGnu, and successfully completed all the SPECInt 2000
tests (using hot/cold partitioning) which the 3.5 compiler passes (without partitioning).


Below is the corrected ChangeLog and patch. Is this okay to commit to 3.5?

-- Caroline Tice
ctice@apple.com


2004-02-27 Caroline Tice <ctice@apple.com>


* basic-block.h (partition_hot_cold_basic_blocks): Add extern function
declaration.
* bb-reorder.c (Copyright): Add 2004 to the year list.
(function.h, obstack.h, expr.h, regs.h): Add four new include
statements.
(N_ROUNDS): Increase the maximum number of rounds by 1.
(branch_threshold): Add array value for new round.
(exec_threshold): Add array value for new round.
(find_rarely_executed_basic_blocks): New function.
(mark_bb_for_unlikely_executed_section): New function.
(color_basic_blocks): New function.
(find_all_crossing_edges): New function.
(add_labels_and_missing_jumps): New function.
(fix_up_fall_thru_edges): New function.
(find_jump_block): New function.
(fix_crossing_conditional_branches): New function.
(fix_crossing_unconditional_branches): New function.
(fix_edges_for_rarely_executed_code): New function.
(in_same_section): New function.
(partition_hot_cold_basic_blocks): New function.
(find_traces): Add an extra round for partitioning hot/cold
basic blocks.
(find_traces_1_round): Modify to push all cold blocks (and only
cold blocks) into the last (extra) round of collecting traces.
(bb_to_key): Add code to correctly identify cold blocks when
doing partitioning (can't rely on block count at this point).
(connect_traces): Modify to connect all the non-cold traces first, then
go back and connect up all the cold traces.
* cfgbuild.c (make_edges): Update current_function_has_computed_jump
if we are doing hot/cold partitioning.
* cfgcleanup.c (cfglayout.h): Add new include statement.
(try_simplify_condjump): Modify to not attempt on blocks with jumps
that cross section boundaries.
(try_forward_edges): Likewise.
(merge_blocks_move_predecessor_nojumps): Likewise.
(merge_blocks_move_successor_nojumps): Likewise.
(merge_blocks_move): Likewise.
(try_crossjump_to_edge): Modify to not attempt after we have done
the block partitioning.
(try_crossjump_bb): Modify to not attempt on blocks with jumps that
cross section boundaries.
(try_optimize_cfg): Likewise.
* cfghooks.c (tidy_fallthru_edges): Modify to not remove indirect
jumps that cross section boundaries.
* cfglayout.c (flags.h): Add new include statement.
(update_unlikely_executed_notes): New function.
(fixup_reorder_chain): Add code so when a new jumping basic block is
added, it's UNLIKELY_EXECUTED_CODE note is updated appropriately.
(duplicate_insn_chain): Add code to duplicate the new NOTE insn
introduced by this optimization.
* cfglayout.h (scan_ahead_for_unlikely_executed_note): Add new
extern function declaration.
* cfgrtl.c (can_delete_note_p): Add NOTE_INSN_UNLIKELY_EXECUTED_CODE to
list of notes that can be deleted.
(rtl_merge_blocks): Modify to not attempt on blocks with
jumps that cross section boundaries.
(rtl_can_merge_blocks_p): Modify to test blocks for jumps that cross
section boundaries.
(try_redirect_by_replacing_jump): Modify to not attempt on jumps that
cross section boundaries.
(commit_one_edge_insertion): Add code so newly created basic block
ends up in correct (hot or cold) section.
(cfg_layout_can_merge_blocks_p): Modify to test blocks for jumps that
cross section boundaries.
(cfg_layout_merge_blocks): Modify to not attempt on blocks that cross
seciton boundaries.
(force_nonfallthru_and_redirect): Modify to make sure new basic block
ends up in correct section.
(commit_one_edge_insertion): Modify to disallow insertions before
NOTE_INSN_UNLIKELY_EXECUTED_CODE notes.
* common.opt (freorder-blocks-and-partition): Add new flag for this
optimization.
* dbxout.c (dbx_function_end): Add code to make sure scope labels at
the end of functions are written into the correct (hot or cold)
section.
(dbx_source_file): Add code to so writing debug file information
doesn't incorrectly change sections.
* defaults.h (HOT_TEXT_SECTION_NAME): Modify value to work for
linux/i386.
(UNLIKELY_EXECUTED_TEXT_SECTION_NAME): Modify value to work for
linux/i386.
(SECTION_FORMAT_STRING): New global variable, for linux/i386 hot/cold
section partitioning.
(HAS_LONG_COND_BRANCH): New global variable, indicating whether or not
conditional branches can span all of memory.
(HAS_LONG_UNCOND_BRANCH): New global variable, indicationg whether or not
unconditional branches can span all of memory.
* final.c (scan_ahead_for_unlikely_executed_note): New function.
(is_jump_table_basic_block): New function.
(final_scan_insn): Add code to check for NOTE instruction indicating
whether basic block belongs in hot or cold section, and to make sure
the current basic block is being written to the appropriate section.
Also added code to ensure that jump table basic blocks end up in the
correct section.
* flags.h (flag_reorder_blocks_and_partition): New flag.
* ifcvt.c (find_if_case_1): Modify to not attempt if conversion if
one of the branches has a jump that crosses between sections.
(find_if_case_2): Likewise.
* opts.c (decode_options): Code to handle new flag,
flag_reorder_blocks_and_partition; also to turn it off if
flag_exceptions is on.
(common_handle_option): Code to handle new flag,
flag_reorder_blocks_and_partition.
* output.h (unlikely_text_section): New extern function declaration.
(in_unlikely_text_section): New extern function declaration.
* print-rtl.c (print_rtx): Add code for handling new note,
NOTE_INSN_UNLIKELY_EXECUTED_CODE
* rtl.c (NOTE_INSN_UNLIKELY_EXECUTED_CODE): New note (see below).
(REG_CROSSING_JUMP): New kind of reg_note, to mark jumps that
cross between section boundaries.
* rtl.h (NOTE_INSN_UNLIKELY_EXECUTED_CODE): New note instruction,
indicating the basic block containing it belongs in the cold section.
(REG_CROSSING_JUMP): New type of reg_note, to mark jumps that cross
between hot and cold sections.
* toplev.c (flag_reorder_blocks_and_partition): Add code to
initialize this flag, and to tie it to the command-line option
freorder-blocks-and-partition.
(rest_of_handle_stack_regs): Add flag_reorder_blocks_and_partition
as an 'or' condition for calling reorder_basic_blocks.
(rest_of_handle_reorder_blocks): Add
flag_reorder_blocks_and_partition as an 'or' condition for calling
reorder_basic_blocks.
(rest_of_compilation): Add call to partition_hot_cold_basic_blocks.
* varasm.c (cfglayout.h): Add new include statement.
(unlikely_section_label_printed): New global variable, used for
determining when to output section name labels for cold sections.
(in_section): Add in_unlikely_executed_text to enum data structure.
(text_section): Modify code to use SECTION_FORMAT_STRING and
HOT_TEXT_SECTION_NAME macros.
(unlikely_text_section): New function.
(in_unlikely_text_section): New function.
(function_section): Add code to make sure beginning of function is
written into correct section (hot or cold).
(assemble_start_function): Add code to make sure stuff is written to
the correct section.
(assemble_zeros): Add in_unlikely_text_section as an 'or' condition
to an if statement that was checking 'in_text_section'.
(assemble_variable): Add 'in_unlikely_text_section' as an 'or'
condition to an if statement that was checking 'in_text_section'.
(default_section_type_flags_1): Add check: if in cold section
flags = SECTION_CODE.
* config/i386/i386.h (HAS_LONG_COND_BRANCH): Defined this macro
specifically for the i386.
(HAS_LONG_UNCOND_BRANCH): Defined this macro specifically for the i386.
* config/rs6000/darwin.h (UNLIKELY_EXECUTED_TEXT_SECTION_NAME): Change
text string to something more informative.
(SECTION_FORMAT_STRING): Add new definition.
* config/rs6000/rs6000.c (rs6000_assemble_integer): Add
'!in_unlikely_text_section' as an 'and' condition to an if statement
that was already checking '!in_text_section'.
* config/rs6000/sysv4.h (HOT_TEXT_SECTION_NAME,
UNLIKELY_EXECUTED_TEXT_SECTION_NAME,SECTION_FORMAT_STRING): Make
sure these are properly defined for linux on ppc.
* doc/invoke.texi (freorder-blocks-and-partition): Add documentation
for this new flag.
* doc/rtl.texi (REG_CROSSING_JUMP): Add documentation for new
reg_note.
* doc/tm.texi (SECTION_FORMAT_STRING, HAS_LONG_COND_BRANCH,
HAS_LONG_UNCOND_BRANCH): Add documentation for these new macros.



Attachment: gcc5-hot-cold3d.txt
Description: Text document




On Feb 24, 2004, at 11:17 AM, Caroline Tice wrote:


Okay, here is the latest version of this patch. The basic idea behind this optimization is to use the
feedback information to identify basic blocks as either "hot" or "cold", and to put the cold basic blocks into
a different section of the executable than the hot blocks. This is designed to improve paging and cache
locality. In order to implement this, I have introduced an new INSN_NOTE, which identifies cold blocks,
and is added to each cold basic block. These notes are then used, when writing out the assembly code,
to change between hot and cold sections as appropriate.


The real difficulty in implementing this optimization is that hot and cold sections can be arbitrarily far apart in
the final executable, which means that, for some architectures, conditional or unconditional branches may not
be able to span the distance between the sections. Therefore, in addition to identifying hot/cold basic blocks
and writing them to separate sections, this optimization also needs to fix up the CFG in the following manner:
1). Any edges that "fall through" from one section to another must be converted to an explicit branch
instruction. 2). For those architectures where conditional branches cannot reach all of memory, any
conditional branches that cross between sections need to be converted to unconditional branches (i.e.
conditional branch to another block in same section, and the second block contains an unconditional branch that
crosses between sections). 3). Finally, if the architecture does not have unconditional branches that can span
all of memory, convert any unconditional branches that cross between sections into indirect jumps.


The patch I have written does all of that. I have implemented two new macros, HAVE_LONG_COND_BRANCH
and HAVE_LONG_UNCOND_BRANCH, which are defined in defaults.h and which can be redefined for
each target architecture, which are used to determine whether or not steps 2 & 3 above need to be done or
not.


The remaining tricky part is that converting unconditional branches into indirect jumps requires using
an extra register, which means this transformation needs to be done before register allocation has occurred.
So I put in the code that marks basic blocks and fixes up the edges before register allocation. However there
are a lot of optimizations that keep trying to collapse the indirect jump back into an unconditional branch, or
even back into the conditional branch. So I also had to find all of those optimizations and make sure they
didn't attempt to do that to jumps that cross between sections. To help with this, I have introduced a new
reg_note, REG_CROSSING_JUMP, which gets attached to branch instructions created in steps 2 & 3 above.


Finally, I modifed the code slightly in the basic block reordering optimization so that if we are doing the
partitioning optimization it tries to put all the hot blocks together and all the cold blocks together.


Because this optimization does not currently work with exception handling, I have added code to
decode_options to turn it off if flag_exceptions is on. This patch does not do anything about fixing debug
information but I hope to do so in the future.


This patch has been bootstrapped on an Apple G4 running darwin, and has passed all the SPECInt
tests that 3.5 passes; I'm in the process of running the DejaGnu tests on the Apple, and also bootstrapping and
running the DejaGnu and Spec tests on an i386 running Linux. Assuming it passes all the tests, is this
okay to commit to 3.5?


-- Caroline Tice
ctice@apple.com



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]