This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH - revised] New Optimization: Partitioning hot & cold basic blocks
- From: Caroline Tice <ctice at apple dot com>
- To: gcc-patches at gcc dot gnu dot org
- Cc: Caroline Tice <ctice at apple dot com>
- Date: Thu, 16 Oct 2003 15:45:33 -0700
- Subject: [PATCH - revised] New Optimization: Partitioning hot & cold basic blocks
I have modified my patch to address all the comments I have received to
date. The original
summary of the patch was:
The following patch implements an optimization we have had in the
Apple version of gcc for
the past 6 months, and which we would like the FSF gcc community to
adopt. This
optimization builds on the basic block reordering optimization. As
with the basic block
reordering optimization, it uses feedback profile information. With
this information it tags
every basic block as either 'hot' or 'cold'. When the assembly and/or
.o files are written,
the hot and cold basic blocks are written into separate sections. The
idea behind this
optimization is to improve paging and cache locality performance. In
order to deal with
basic blocks that appear to be close together in the CFG potentially
being written far
apart in the assembly and .o files, there is some code for cleaning up
edges that cross
between hot and cold sections.
Testing Information:
----------------------------
HW, OS: - Apple G4 (Dual 1.25 GHz PowerPC), running Mac OS X (Jaguar,
Panther)
- Apple G5 (2.0 GHz 970), running Max OS X
(Panther)
- Apple G4, running PPC Linux
Tests Run (and passed):
- test case specifically designed to test the hot/cold
partitioning. Verified that it compiled
and ran correctly and did the partitioning.
- SPECInt 2000 test cases that FSF gcc 3.4 passes (gzip, vpr, mcf,
parser, twolf)
- Bootstrapping
- DejaGnu test suite
Performance Improvement Results:
------------------------------------------------
In my unofficial SPEC runs on my machine, I get an improvement in the
SPEC ratio
of anywhere from 0.2% to 1.7% for gzip, mcf, parser, and twolf. vpr
got worse by 1.2%.
The compile time differences were negligible.
Binary File Size Changes:
------------------------------------
The binary sizes show a size increase ranging from 3% to 7%. Out of my
five test cases,
the largest percentage increase I saw was in parser, which increased
from 203,256 blocks to
218,408 blocks (up by 7.5%).
Other questions/comments/issues raised:
--------------------------------------------------------
From Zdenek Dvorak:
2) What are the differences between results using profile feedback and
static estimates?
Using this new optimization with profile feedback showed an
improvement over using
the new optimization with static estimates in all cases but one (for
mcf the two cases were
identical). In the cases where feedback gave an improvement, the
improvements ranged
from 0.2% (vpr) to 2.8% (gzip).
4) If possible, some oprofile data, especially impact on instruction
cache and branch prediction.
I don't have this information at this time.
From Gerald Pfeifer:
Note that several lines there (and also in the code, in some places)
are
too long; I believe the limit is 77 columns or something like that.
I have made sure the lines are all short enough now.
From Joseph Myers:
This patch doesn't include documentation for the new command-line
options
or target macros.
It does now.
From Jan Hubicka
This is something I was thinking about for a while. How do you deal
with debug info and unwind tables?
1). We don't attempt to perform this optimization in the presence of
exception handling at
the moment. (I put in an explicit test for this).
2). I have not attempted to ensure that all the debug information is
correct. I *have*
added, at the beginning of the cold section for each function, a global
unique label/tag
of the form "_<funciton_name>_unexecuted_section", which helps gdb tell
the user more
accurately where the user is when debugging. I have run gdb on an
executable that has
been partitioned into hot and cold sections. I remember being able to
do my debugging, but
I wasn't looking closely at the debugging information, and I can't
remember what particularly
did or did not work well.
I would much preffer to store this in cfg datastructure and emit this
note just before we destroy the CFG. In general I would like to remove
use for INSN_NOTES from any cfg aware code.
I have modified the original patch to eliminate two of the three notes,
and I have
tried to remove the NOTES as much as possible from the cfg aware code.
The long answer (gory details): There are two things I do: 1). I go
through the CFG and
find all fall-thru edges that cross between hot and cold sections. I
then insert a new
basic block for the fall-thru edge to fall into (making sure the
source
and destination of
the fall thru edge are in the same (hot or cold) section). The new
I see this is done in fix_up_fallthru_edges. There is force_nonfallthru
function that does precisely that. It would be better to avoid code
duplication.
I have modified fix_up_fallthru_edges to use force_nonfallthru instead
of
duplicating the code.
I was also thinking a bit about your changes to cfg-cleanup. With my
changes to hookize cfg manipulation you can do cfg-cleanup before BB
reorder in cfg_layout mode (I believe it is done now). so you should no
longer need to run cfg_cleanup after BB reordering and thus you should
not need the changes... That would simplify the patch a bit :)
Done. :-)
From Andreas Jaeger:
+ bool has_section_boundary_note (basic_block);
+ bool has_dont_shorten_note (basic_block);
Either make these static or move the out of the file in a header file.
Done.
+ bool
+ has_dont_shorten_note (bb)
+ basic_block bb;
Please use ISO C90 prototypes and do not introduce again K&R ones.
Done.
+
+ static void update_unlikely_executed_notes (basic_block);
+ extern bool has_dont_shorten_branch (basic_block);
Please put the extern in an appropriate header file, it should not
live here.
Done.
extern void cfg_layout_initialize_rbi (basic_block);
+ extern bool has_section_boundary_note (basic_block);
+ extern bool has_dont_shorten_note (basic_block);
+ extern bool scan_ahead_for_unlikely_executed_note (rtx);
You've got the extern declarations here already, so no need for them
above - or why do you duplicate them?
Redundancy/duplication has been removed.
+ Reorder basic blocks and parition into hot and cold sections
Typo: partition
Fixed.
I believe this addresses everyone's comments and requested changes.
After making all of these changes I did a cvs update and re-ran all the
tests. It still passes. The latest
ChangeLog and patch are below. May I please commit this to gcc 3.4?
Below is the ChangeLog entry for this patch:
2003-10-16 Caroline Tice <ctice@apple.com>
* basic-block.h (struct basic_block): Add new field,
"section_boundary" .
(partition_hot_cold_basic_blocks): Add extern function
declaration.
* bb-reorder.c (function.h, obstack.h): Add two new include
statements.
(find_rarely_executed_basic_blocks): New function.
(mark_bb_for_unlikely_executed_section): New function.
(color_basic_blocks): New function.
(find_all_crossing_edges): New function.
(add_labels_and_missing_jumps): New function.
(add_section_boundary_info): New function.
(fix_up_fall_thru_edges): New function.
(fix_edges_for_rarely_executed_code): New function.
(insn_on_section_boundary): New function.
(partition_hot_cold_basic_blocks): New function.
* cfg.c (struct basic_block_def entry_exit_blocks): Add
initialization value for new
"section_boundary" field).
* cfglayout.c (update_unlikely_executed_notes): New function.
(fixup_reorder_chain):
Added code so that when a new jumping basic block is added, it's
"section_boundary"
and UNLIKELY_EXECUTED_CODE note are updated appropriately.
notes and tags for this new optimization.
Added code to update basic block indices correctly for the new NOTE
insns introduced by
this optimization.
(duplicate_insn_chain): Added code to correctly duplicate the new
NOTE insn
introduced by this optimization.
* cfglayout.h (scan_ahead_for_unlikely_executed_note): Added new
extern function
declaration.
* common.opt (freorder-blocks-and-partition): Added new flag for this
optimization.
* dbxout.c (dbx_function_end): Added code to make sure scope labels
at the end of
functions are written into the correct (hot or cold) section.
* final.c (shorten_branches): Added #ifdef code that checks for the
definition of
LONG_COND_BRANCH_SIZE, which is to be defined in the machine-specific
code for
the compiler (for those architectures which use "short" conditional
branches, which may
not be able to span the distance between hot and cold sections in the
.s or .o file). If this
size is defined, and if we are performing the partitioning
optimization, and if the current
instruction is a jump instruction that crosses between hot and cold
sections, the size of
the jump insn is updated to be the size defined by
LONG_COND_BRANCH_SIZE. This
size is then used later, in machine-specific code to convert
conditional branches that are
too short into appropriate unconditional branches.
(scan_ahead_for_unlikely_executed_note): New function.
(is_jump_table_basic_block): New function.
(final_scan_insn): Added code to check for NOTE instruction
indicating whether basic
block belongs in hot or cold section, and to make sure the current
basic block is
being written to the appropriate section. Also added code to ensure
that jump table
basic blocks end up in the correct section.
* flags.h (flag_reorder_blocks_and_partition): New flag.
* opts.c (decode_options): Code to handle new flag,
flag_reorder_blocks_and_partition.
(common_handle_option): Code to handle new flag,
flag_reorder_blocks_and_partition.
* output.h (unlikely_text_section): New extern function declaration.
(in_unlikely_text_section): New extern function declaration.
(HOT_TEXT_SECTION_NAME): Make sure this macro is defined.
(UNLIKELY_EXECUTED_TEXT_SECTION_NAME): Make sure this macro is defined.
(SECTION_FORMAT_STRING): Make sure this macro is defined.
* print-rtl.c (print_rtx): Added code for handling new note,
NOTE_INSN_UNLIKELY_EXECUTED_CODE
* rtl.c (NOTE_INSN_UNLIKELY_EXECUTED_CODE): New note (see below).
* rtl.h (NOTE_INSN_UNLIKELY_EXECUTED_CODE): New note instruction,
indicating
the basic block containing it belongs in the cold section.
(insn_on_section_boundary) : New extern function declaration.
* toplev.c (flag_reorder_blocks_and_partition): Added code to
initialize this flag, and to
tie it to the command-line option freorder-blocks-and-partition.
(rest_of_handle_stack_regs): Added flag_reorder_blocks_and_partition
as an 'or'
condition for calling reorder_basic_blocks.
(rest_of_handle_reorder_blocks): Added
flag_reorder_blocks_and_partition as an 'or'
condition for calling reorder_basic_blocks.
(rest_of_compilation): Added call to partition_hot_cold_basic_blocks.
* varasm.c (cfglayout.h): Added new include statement.
(unlikely_section_label_printed): New global variable, used for
determining when to
output section name labels for cold sections.
(in_section): Added in_unlikely_executed_text to enum data structure.
(text_section): Modified code to use SECTION_FORMAT_STRING and
HOT_TEXT_SECTION_NAME macros.
(unlikely_text_section): New function.
(in_unlikely_text_section): New function.
(function_section): Added code to make sure beginning of function is
written into
correct section (hot or cold).
(assemble_start_function): Added code to make sure stuff is written
to the correct
section.
(assemble_zeros): Added in_unlikely_text_section as an 'or' condition
to an if
statement that was checking 'in_text_section'.
(assemble_variable): Added 'in_unlikely_text_section' as an 'or'
condition to an if
statement that was checking 'in_text_section'.
(default_section_type_flags_1): Added check: if in cold
section flags = SECTION_CODE.
* config/rs6000/darwin.h (UNLIKELY_EXECUTED_TEXT_SECTION_NAME): changed
text string to something more informative.
(SECTION_FORMAT_STRING): Added new definition.
* config/rs6000/rs6000.c (rs6000_assemble_integer): Added
'!in_unlikely_text_section'
as an 'and' condition to an if statement that was already checking
'!in_text_section'.
(output_cbranch): Modified 'need_longbranch' to be true if an insn
size is
LONG_COND_BRANCH_SIZE.
* config/rs6000/rs6000.h (LONG_COND_BRANCH_SIZE): Added new
definition.
* config/rs6000/sysv4.h (HOT_TEXT_SECTION_NAME,
UNLIKELY_EXECUTED_TEXT_SECTION_NAME,SECTION_FORMAT_STRING):
Made
sure these are properly defined for linux on ppc.
* doc/invoke.texi (freorder-blocks-and-partition): Added
documentation for this new flag.
* doc/tm.text (SECTION_FORMAT_STRING, LONG_COND_BRANCH_SIZE): Added
documentation for these new macros.
Attachment:
hot-cold-patch.v4.txt
Description: Text document