This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH - revised] New Optimization: Partitioning hot & cold basic blocks


I have modified my patch to address all the comments I have received to date. The original
summary of the patch was:


The following patch implements an optimization we have had in the Apple version of gcc for
the past 6 months, and which we would like the FSF gcc community to adopt. This
optimization builds on the basic block reordering optimization. As with the basic block
reordering optimization, it uses feedback profile information. With this information it tags
every basic block as either 'hot' or 'cold'. When the assembly and/or .o files are written,
the hot and cold basic blocks are written into separate sections. The idea behind this
optimization is to improve paging and cache locality performance. In order to deal with
basic blocks that appear to be close together in the CFG potentially being written far
apart in the assembly and .o files, there is some code for cleaning up edges that cross
between hot and cold sections.



Testing Information: ----------------------------

HW, OS: - Apple G4 (Dual 1.25 GHz PowerPC), running Mac OS X (Jaguar, Panther)
- Apple G5 (2.0 GHz 970), running Max OS X (Panther)
- Apple G4, running PPC Linux


Tests Run (and passed):
- test case specifically designed to test the hot/cold partitioning. Verified that it compiled
and ran correctly and did the partitioning.
- SPECInt 2000 test cases that FSF gcc 3.4 passes (gzip, vpr, mcf, parser, twolf)
- Bootstrapping
- DejaGnu test suite



Performance Improvement Results: ------------------------------------------------

In my unofficial SPEC runs on my machine, I get an improvement in the SPEC ratio
of anywhere from 0.2% to 1.7% for gzip, mcf, parser, and twolf. vpr got worse by 1.2%.
The compile time differences were negligible.


Binary File Size Changes:
------------------------------------

The binary sizes show a size increase ranging from 3% to 7%. Out of my five test cases,
the largest percentage increase I saw was in parser, which increased from 203,256 blocks to
218,408 blocks (up by 7.5%).



Other questions/comments/issues raised: --------------------------------------------------------

From Zdenek Dvorak:

2) What are the differences between results using profile feedback and
   static estimates?

Using this new optimization with profile feedback showed an improvement over using
the new optimization with static estimates in all cases but one (for mcf the two cases were
identical). In the cases where feedback gave an improvement, the improvements ranged
from 0.2% (vpr) to 2.8% (gzip).


4) If possible, some oprofile data, especially impact on instruction
   cache and branch prediction.

I don't have this information at this time.



From Gerald Pfeifer:


Note that several lines there (and also in the code, in some places) are
too long; I believe the limit is 77 columns or something like that.

I have made sure the lines are all short enough now.


From Joseph Myers:

This patch doesn't include documentation for the new command-line options
or target macros.



It does now.


From Jan Hubicka


This is something I was thinking about for a while. How do you deal with debug info and unwind tables?

1). We don't attempt to perform this optimization in the presence of exception handling at
the moment. (I put in an explicit test for this).


2). I have not attempted to ensure that all the debug information is correct. I *have*
added, at the beginning of the cold section for each function, a global unique label/tag
of the form "_<funciton_name>_unexecuted_section", which helps gdb tell the user more
accurately where the user is when debugging. I have run gdb on an executable that has
been partitioned into hot and cold sections. I remember being able to do my debugging, but
I wasn't looking closely at the debugging information, and I can't remember what particularly
did or did not work well.



I would much preffer to store this in cfg datastructure and emit this note just before we destroy the CFG. In general I would like to remove use for INSN_NOTES from any cfg aware code.

I have modified the original patch to eliminate two of the three notes, and I have
tried to remove the NOTES as much as possible from the cfg aware code.



The long answer (gory details): There are two things I do: 1). I go
through the CFG and
find all fall-thru edges that cross between hot and cold sections. I
then insert a new
basic block for the fall-thru edge to fall into (making sure the source
and destination of
the fall thru edge are in the same (hot or cold) section). The new

I see this is done in fix_up_fallthru_edges. There is force_nonfallthru function that does precisely that. It would be better to avoid code duplication.


I have modified fix_up_fallthru_edges to use force_nonfallthru instead of
duplicating the code.



I was also thinking a bit about your changes to cfg-cleanup. With my changes to hookize cfg manipulation you can do cfg-cleanup before BB reorder in cfg_layout mode (I believe it is done now). so you should no longer need to run cfg_cleanup after BB reordering and thus you should not need the changes... That would simplify the patch a bit :)

Done. :-)


From Andreas Jaeger:

+ bool has_section_boundary_note (basic_block);
+ bool has_dont_shorten_note (basic_block);

Either make these static or move the out of the file in a header file.

Done.


+ bool
+ has_dont_shorten_note (bb)
+      basic_block bb;

Please use ISO C90 prototypes and do not introduce again K&R ones.

Done.


+
+ static void update_unlikely_executed_notes (basic_block);
+ extern bool has_dont_shorten_branch (basic_block);

Please put the extern in an appropriate header file, it should not live here.

Done.


  extern void cfg_layout_initialize_rbi	(basic_block);
+ extern bool has_section_boundary_note (basic_block);
+ extern bool has_dont_shorten_note (basic_block);
+ extern bool scan_ahead_for_unlikely_executed_note (rtx);

You've got the extern declarations here already, so no need for them above - or why do you duplicate them?


Redundancy/duplication has been removed.


+ Reorder basic blocks and parition into hot and cold sections

Typo: partition

Fixed.



I believe this addresses everyone's comments and requested changes. After making all of these changes I did a cvs update and re-ran all the tests. It still passes. The latest
ChangeLog and patch are below. May I please commit this to gcc 3.4?





Below is the ChangeLog entry for this patch:



2003-10-16 Caroline Tice <ctice@apple.com>


* basic-block.h (struct basic_block): Add new field, "section_boundary" .
(partition_hot_cold_basic_blocks): Add extern function declaration.


* bb-reorder.c (function.h, obstack.h): Add two new include statements.
(find_rarely_executed_basic_blocks): New function.
(mark_bb_for_unlikely_executed_section): New function.
(color_basic_blocks): New function.
(find_all_crossing_edges): New function.
(add_labels_and_missing_jumps): New function.
(add_section_boundary_info): New function.
(fix_up_fall_thru_edges): New function.
(fix_edges_for_rarely_executed_code): New function.
(insn_on_section_boundary): New function.
(partition_hot_cold_basic_blocks): New function.


* cfg.c (struct basic_block_def entry_exit_blocks): Add initialization value for new
"section_boundary" field).


* cfglayout.c (update_unlikely_executed_notes): New function.
(fixup_reorder_chain):
Added code so that when a new jumping basic block is added, it's "section_boundary"
and UNLIKELY_EXECUTED_CODE note are updated appropriately.
notes and tags for this new optimization.
Added code to update basic block indices correctly for the new NOTE insns introduced by
this optimization.
(duplicate_insn_chain): Added code to correctly duplicate the new NOTE insn
introduced by this optimization.


* cfglayout.h (scan_ahead_for_unlikely_executed_note): Added new extern function
declaration.


* common.opt (freorder-blocks-and-partition): Added new flag for this optimization.

* dbxout.c (dbx_function_end): Added code to make sure scope labels at the end of
functions are written into the correct (hot or cold) section.


* final.c (shorten_branches): Added #ifdef code that checks for the definition of
LONG_COND_BRANCH_SIZE, which is to be defined in the machine-specific code for
the compiler (for those architectures which use "short" conditional branches, which may
not be able to span the distance between hot and cold sections in the .s or .o file). If this
size is defined, and if we are performing the partitioning optimization, and if the current
instruction is a jump instruction that crosses between hot and cold sections, the size of
the jump insn is updated to be the size defined by LONG_COND_BRANCH_SIZE. This
size is then used later, in machine-specific code to convert conditional branches that are
too short into appropriate unconditional branches.
(scan_ahead_for_unlikely_executed_note): New function.
(is_jump_table_basic_block): New function.
(final_scan_insn): Added code to check for NOTE instruction indicating whether basic
block belongs in hot or cold section, and to make sure the current basic block is
being written to the appropriate section. Also added code to ensure that jump table
basic blocks end up in the correct section.


* flags.h (flag_reorder_blocks_and_partition): New flag.

* opts.c (decode_options): Code to handle new flag, flag_reorder_blocks_and_partition.
(common_handle_option): Code to handle new flag, flag_reorder_blocks_and_partition.


	* output.h (unlikely_text_section): New extern function declaration.
	(in_unlikely_text_section): New extern function declaration.
	(HOT_TEXT_SECTION_NAME): Make sure this macro is defined.
	(UNLIKELY_EXECUTED_TEXT_SECTION_NAME): Make sure this macro is defined.
	(SECTION_FORMAT_STRING):  Make sure this macro is defined.

	* print-rtl.c (print_rtx): Added code for handling new note,
         NOTE_INSN_UNLIKELY_EXECUTED_CODE

* rtl.c (NOTE_INSN_UNLIKELY_EXECUTED_CODE): New note (see below).

* rtl.h (NOTE_INSN_UNLIKELY_EXECUTED_CODE): New note instruction, indicating
the basic block containing it belongs in the cold section.
(insn_on_section_boundary) : New extern function declaration.


* toplev.c (flag_reorder_blocks_and_partition): Added code to initialize this flag, and to
tie it to the command-line option freorder-blocks-and-partition.
(rest_of_handle_stack_regs): Added flag_reorder_blocks_and_partition as an 'or'
condition for calling reorder_basic_blocks.
(rest_of_handle_reorder_blocks): Added flag_reorder_blocks_and_partition as an 'or'
condition for calling reorder_basic_blocks.
(rest_of_compilation): Added call to partition_hot_cold_basic_blocks.


* varasm.c (cfglayout.h): Added new include statement.
(unlikely_section_label_printed): New global variable, used for determining when to
output section name labels for cold sections.
(in_section): Added in_unlikely_executed_text to enum data structure.
(text_section): Modified code to use SECTION_FORMAT_STRING and
HOT_TEXT_SECTION_NAME macros.
(unlikely_text_section): New function.
(in_unlikely_text_section): New function.
(function_section): Added code to make sure beginning of function is written into
correct section (hot or cold).
(assemble_start_function): Added code to make sure stuff is written to the correct
section.
(assemble_zeros): Added in_unlikely_text_section as an 'or' condition to an if
statement that was checking 'in_text_section'.
(assemble_variable): Added 'in_unlikely_text_section' as an 'or' condition to an if
statement that was checking 'in_text_section'.
(default_section_type_flags_1): Added check: if in cold section flags = SECTION_CODE.


	* config/rs6000/darwin.h (UNLIKELY_EXECUTED_TEXT_SECTION_NAME): changed
	text string to something more informative.
	(SECTION_FORMAT_STRING):  Added new definition.

* config/rs6000/rs6000.c (rs6000_assemble_integer): Added '!in_unlikely_text_section'
as an 'and' condition to an if statement that was already checking '!in_text_section'.
(output_cbranch): Modified 'need_longbranch' to be true if an insn size is
LONG_COND_BRANCH_SIZE.


* config/rs6000/rs6000.h (LONG_COND_BRANCH_SIZE): Added new definition.

* config/rs6000/sysv4.h (HOT_TEXT_SECTION_NAME,
UNLIKELY_EXECUTED_TEXT_SECTION_NAME,SECTION_FORMAT_STRING): Made
sure these are properly defined for linux on ppc.


* doc/invoke.texi (freorder-blocks-and-partition): Added documentation for this new flag.

	* doc/tm.text (SECTION_FORMAT_STRING, LONG_COND_BRANCH_SIZE): Added
          documentation for these new macros.



Attachment: hot-cold-patch.v4.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]