This is the mail archive of the
mailing list for the GCC project.
PATCH: New Optimization: Partitioning hot & cold basic blocks
- From: Caroline Tice <ctice at apple dot com>
- To: gcc-patches at gcc dot gnu dot org
- Cc: Caroline Tice <ctice at apple dot com>
- Date: Wed, 28 Jan 2004 15:01:14 -0800
- Subject: PATCH: New Optimization: Partitioning hot & cold basic blocks
Okay, here we go again. Last October I submitted this patch for a new optimization that partitions
hot and cold basic blocks into separate sections of the .o file, based on feedback information and
the existing bb-reorder stuff. I received many suggestions for improvements and queries about
performance, etc. I have incorporated all the suggestions into the patch, and updated it to work with
the current 3.5 mainline gcc. Below is a recap of my answers to the questions (and my original post).
There was also a question raised about whether or not there was an existing patent for this method
which my patch violated. I have since located the patent in question and after reading it through
carefully I have come to the conclusion that my patch does not violate the patent. So once again
I am asking permission to commit this new optimization to the 3.5 mainline compiler.
Okay to commit?
-- Caroline Tice
HW, OS: - Apple G4 (Dual 1.25 GHz PowerPC), running Mac OS X (Jaguar, Panther)
- Apple G5 (2.0 GHz 970), running Max OS X (Panther)
- Apple G4, running PPC Linux
- Pentium 4, running linux
Tests Run (and passed):
- test case specifically designed to test the hot/cold partitioning. Verified that it compiled
and ran correctly and did the partitioning.
- SPECInt 2000 test cases that FSF gcc 3.5 passes
- DejaGnu test suite
Performance Improvement Results:
In my unofficial SPEC runs on my machine, I get an improvement in the SPEC ratio
of anywhere from 0.2% to 1.7% for gzip, mcf, parser, and twolf. vpr got worse by 1.2%.
The compile time differences were negligible.
Binary File Size Changes:
The binary sizes show a size increase ranging from 3% to 7%. Out of my five test cases,
the largest percentage increase I saw was in parser, which increased from 203,256 blocks to
218,408 blocks (up by 7.5%).
Other questions/comments/issues raised:
>From Zdenek Dvorak:
2) What are the differences between results using profile feedback and
Using this new optimization with profile feedback showed an improvement over using
the new optimization with static estimates in all cases but one (for mcf the two cases were
identical). In the cases where feedback gave an improvement, the improvements ranged
from 0.2% (vpr) to 2.8% (gzip).
4) If possible, some oprofile data, especially impact on instruction
cache and branch prediction.
I don't have this information at this time.
>From Jan Hubicka
This is something I was thinking about for a while. How do you deal
with debug info and unwind tables?
1). We don't attempt to perform this optimization in the presence of exception handling at
the moment. (I put in an explicit test for this).
2). I have not attempted to ensure that all the debug information is correct. I *have*
added, at the beginning of the cold section for each function, a global unique label/tag
of the form "_<function_name>_unexecuted_section", which helps gdb tell the user more
accurately where the user is when debugging. I have run gdb on an executable that has
been partitioned into hot and cold sections. I remember being able to do my debugging, but
I wasn't looking closely at the debugging information, and I can't remember what particularly
did or did not work well.
2004-01-28 Caroline Tice <firstname.lastname@example.org>
* basic-block.h (struct basic_block): Add new field,
(partition_hot_cold_basic_blocks): Add extern function declaration.
* bb-reorder.c (function.h, obstack.h): Add two new include
(find_rarely_executed_basic_blocks): New function.
(mark_bb_for_unlikely_executed_section): New function.
(color_basic_blocks): New function.
(find_all_crossing_edges): New function.
(add_labels_and_missing_jumps): New function.
(add_section_boundary_info): New function.
(fix_up_fall_thru_edges): New function.
(fix_edges_for_rarely_executed_code): New function.
(insn_on_section_boundary): New function.
(partition_hot_cold_basic_blocks): New function.
* cfg.c (struct basic_block_def entry_exit_blocks): Add
initialization value for new "section_boundary" field.
* cfglayout.c (update_unlikely_executed_notes): New function.
(fixup_reorder_chain): Add code so when a new jumping basic block is
added, it's "section_boundary" and UNLIKELY_EXECUTED_CODE note are
(duplicate_insn_chain): Add code to duplicate the new NOTE insn
introduced by this optimization.
* cfglayout.h (scan_ahead_for_unlikely_executed_note): Add new
extern function declaration.
* common.opt (freorder-blocks-and-partition): Add new flag for this
* dbxout.c (dbx_function_end): Add code to make sure scope labels at
the end of functions are written into the correct (hot or cold)
(dbx_source_file): Add code to so writing debug file information
doesn't incorrectly change sections.
* defaults.h (HOT_TEXT_SECTION_NAME): Modify value to work for
(UNLIKELY_EXECUTED_TEXT_SECTION_NAME): Modify value to work for
(SECTION_FORMAT_STRING): New macro, for linux/i386 hot/cold section
* final.c (shorten_branches): Add code (in an #ifdef) for
architectures with short conditional branches to mark them for
modification, to span distance between hot & cold sections.
(scan_ahead_for_unlikely_executed_note): New function.
(is_jump_table_basic_block): New function.
(final_scan_insn): Add code to check for NOTE instruction indicating
whether basic block belongs in hot or cold section, and to make sure
the current basic block is being written to the appropriate section.
Also added code to ensure that jump table basic blocks end up in the
* flags.h (flag_reorder_blocks_and_partition): New flag.
* opts.c (decode_options): Code to handle new flag,
(common_handle_option): Code to handle new flag,
* output.h (unlikely_text_section): New extern function declaration.
(in_unlikely_text_section): New extern function declaration.
* print-rtl.c (print_rtx): Add code for handling new note,
* rtl.c (NOTE_INSN_UNLIKELY_EXECUTED_CODE): New note (see below).
* rtl.h (NOTE_INSN_UNLIKELY_EXECUTED_CODE): New note instruction,
indicating the basic block containing it belongs in the cold section.
(insn_on_section_boundary) : New extern function declaration.
* toplev.c (flag_reorder_blocks_and_partition): Add code to
initialize this flag, and to tie it to the command-line option
(rest_of_handle_stack_regs): Add flag_reorder_blocks_and_partition
as an 'or' condition for calling reorder_basic_blocks.
flag_reorder_blocks_and_partition as an 'or' condition for calling
(rest_of_compilation): Add call to partition_hot_cold_basic_blocks.
* varasm.c (cfglayout.h): Add new include statement.
(unlikely_section_label_printed): New global variable, used for
determining when to output section name labels for cold sections.
(in_section): Add in_unlikely_executed_text to enum data structure.
(text_section): Modify code to use SECTION_FORMAT_STRING and
(unlikely_text_section): New function.
(in_unlikely_text_section): New function.
(function_section): Add code to make sure beginning of function is
written into correct section (hot or cold).
(assemble_start_function): Add code to make sure stuff is written to
the correct section.
(assemble_zeros): Add in_unlikely_text_section as an 'or' condition
to an if statement that was checking 'in_text_section'.
(assemble_variable): Add 'in_unlikely_text_section' as an 'or'
condition to an if statement that was checking 'in_text_section'.
(default_section_type_flags_1): Add check: if in cold section
flags = SECTION_CODE.
* config/rs6000/darwin.h (UNLIKELY_EXECUTED_TEXT_SECTION_NAME): Change
text string to something more informative.
(SECTION_FORMAT_STRING): Add new definition.
* config/rs6000/rs6000.c (rs6000_assemble_integer): Add
'!in_unlikely_text_section' as an 'and' condition to an if statement
that was already checking '!in_text_section'.
(output_cbranch): Modify 'need_longbranch' to be true if an insn size
* config/rs6000/rs6000.h (LONG_COND_BRANCH_SIZE): Add new definition.
* config/rs6000/sysv4.h (HOT_TEXT_SECTION_NAME,
sure these are properly defined for linux on ppc.
* doc/invoke.texi (freorder-blocks-and-partition): Add documentation
for this new flag.
* doc/tm.texi (SECTION_FORMAT_STRING, LONG_COND_BRANCH_SIZE): Add
documentation for these new macros.
Description: Text document