PATCH: New Optimization: Partitioning hot & cold basic blocks
Caroline Tice
ctice@apple.com
Wed Jan 28 23:03:00 GMT 2004
Okay, here we go again. Last October I submitted this patch for a new
optimization that partitions
hot and cold basic blocks into separate sections of the .o file, based
on feedback information and
the existing bb-reorder stuff. I received many suggestions for
improvements and queries about
performance, etc. I have incorporated all the suggestions into the
patch, and updated it to work with
the current 3.5 mainline gcc. Below is a recap of my answers to the
questions (and my original post).
There was also a question raised about whether or not there was an
existing patent for this method
which my patch violated. I have since located the patent in question
and after reading it through
carefully I have come to the conclusion that my patch does not violate
the patent. So once again
I am asking permission to commit this new optimization to the 3.5
mainline compiler.
Okay to commit?
-- Caroline Tice
ctice@apple.com
Testing Information:
----------------------------
HW, OS: - Apple G4 (Dual 1.25 GHz PowerPC), running Mac OS X (Jaguar,
Panther)
- Apple G5 (2.0 GHz 970), running Max OS X
(Panther)
- Apple G4, running PPC Linux
- Pentium 4, running linux
Tests Run (and passed):
- test case specifically designed to test the hot/cold
partitioning. Verified that it compiled
and ran correctly and did the partitioning.
- SPECInt 2000 test cases that FSF gcc 3.5 passes
- Bootstrapping
- DejaGnu test suite
Performance Improvement Results:
------------------------------------------------
In my unofficial SPEC runs on my machine, I get an improvement in the
SPEC ratio
of anywhere from 0.2% to 1.7% for gzip, mcf, parser, and twolf. vpr
got worse by 1.2%.
The compile time differences were negligible.
Binary File Size Changes:
------------------------------------
The binary sizes show a size increase ranging from 3% to 7%. Out of my
five test cases,
the largest percentage increase I saw was in parser, which increased
from 203,256 blocks to
218,408 blocks (up by 7.5%).
Other questions/comments/issues raised:
--------------------------------------------------------
From Zdenek Dvorak:
> 2) What are the differences between results using profile feedback and
> static estimates?
Using this new optimization with profile feedback showed an
improvement over using
the new optimization with static estimates in all cases but one (for
mcf the two cases were
identical). In the cases where feedback gave an improvement, the
improvements ranged
from 0.2% (vpr) to 2.8% (gzip).
> 4) If possible, some oprofile data, especially impact on instruction
> cache and branch prediction.
I don't have this information at this time.
From Jan Hubicka
>
> This is something I was thinking about for a while. How do you deal
> with debug info and unwind tables?
1). We don't attempt to perform this optimization in the presence of
exception handling at
the moment. (I put in an explicit test for this).
2). I have not attempted to ensure that all the debug information is
correct. I *have*
added, at the beginning of the cold section for each function, a global
unique label/tag
of the form "_<function_name>_unexecuted_section", which helps gdb tell
the user more
accurately where the user is when debugging. I have run gdb on an
executable that has
been partitioned into hot and cold sections. I remember being able to
do my debugging, but
I wasn't looking closely at the debugging information, and I can't
remember what particularly
did or did not work well.
2004-01-28 Caroline Tice <ctice@apple.com>
* basic-block.h (struct basic_block): Add new field,
"section_boundary" .
(partition_hot_cold_basic_blocks): Add extern function
declaration.
* bb-reorder.c (function.h, obstack.h): Add two new include
statements.
(find_rarely_executed_basic_blocks): New function.
(mark_bb_for_unlikely_executed_section): New function.
(color_basic_blocks): New function.
(find_all_crossing_edges): New function.
(add_labels_and_missing_jumps): New function.
(add_section_boundary_info): New function.
(fix_up_fall_thru_edges): New function.
(fix_edges_for_rarely_executed_code): New function.
(insn_on_section_boundary): New function.
(partition_hot_cold_basic_blocks): New function.
* cfg.c (struct basic_block_def entry_exit_blocks): Add
initialization value for new "section_boundary" field.
* cfglayout.c (update_unlikely_executed_notes): New function.
(fixup_reorder_chain): Add code so when a new jumping basic
block is
added, it's "section_boundary" and UNLIKELY_EXECUTED_CODE note
are
updated appropriately.
(duplicate_insn_chain): Add code to duplicate the new NOTE insn
introduced by this optimization.
* cfglayout.h (scan_ahead_for_unlikely_executed_note): Add new
extern function declaration.
* common.opt (freorder-blocks-and-partition): Add new flag for
this
optimization.
* dbxout.c (dbx_function_end): Add code to make sure scope
labels at
the end of functions are written into the correct (hot or cold)
section.
(dbx_source_file): Add code to so writing debug file information
doesn't incorrectly change sections.
* defaults.h (HOT_TEXT_SECTION_NAME): Modify value to work for
linux/i386.
(UNLIKELY_EXECUTED_TEXT_SECTION_NAME): Modify value to work for
linux/i386.
(SECTION_FORMAT_STRING): New macro, for linux/i386 hot/cold
section
partitioning.
* final.c (shorten_branches): Add code (in an #ifdef) for
architectures with short conditional branches to mark them for
modification, to span distance between hot & cold sections.
(scan_ahead_for_unlikely_executed_note): New function.
(is_jump_table_basic_block): New function.
(final_scan_insn): Add code to check for NOTE instruction
indicating
whether basic block belongs in hot or cold section, and to make
sure
the current basic block is being written to the appropriate
section.
Also added code to ensure that jump table basic blocks end up
in the
correct section.
* flags.h (flag_reorder_blocks_and_partition): New flag.
* opts.c (decode_options): Code to handle new flag,
flag_reorder_blocks_and_partition.
(common_handle_option): Code to handle new flag,
flag_reorder_blocks_and_partition.
* output.h (unlikely_text_section): New extern function
declaration.
(in_unlikely_text_section): New extern function declaration.
* print-rtl.c (print_rtx): Add code for handling new note,
NOTE_INSN_UNLIKELY_EXECUTED_CODE
* rtl.c (NOTE_INSN_UNLIKELY_EXECUTED_CODE): New note (see
below).
* rtl.h (NOTE_INSN_UNLIKELY_EXECUTED_CODE): New note
instruction,
indicating the basic block containing it belongs in the cold
section.
(insn_on_section_boundary) : New extern function declaration.
* toplev.c (flag_reorder_blocks_and_partition): Add code to
initialize this flag, and to tie it to the command-line option
freorder-blocks-and-partition.
(rest_of_handle_stack_regs): Add
flag_reorder_blocks_and_partition
as an 'or' condition for calling reorder_basic_blocks.
(rest_of_handle_reorder_blocks): Add
flag_reorder_blocks_and_partition as an 'or' condition for
calling
reorder_basic_blocks.
(rest_of_compilation): Add call to
partition_hot_cold_basic_blocks.
* varasm.c (cfglayout.h): Add new include statement.
(unlikely_section_label_printed): New global variable, used for
determining when to output section name labels for cold
sections.
(in_section): Add in_unlikely_executed_text to enum data
structure.
(text_section): Modify code to use SECTION_FORMAT_STRING and
HOT_TEXT_SECTION_NAME macros.
(unlikely_text_section): New function.
(in_unlikely_text_section): New function.
(function_section): Add code to make sure beginning of
function is
written into correct section (hot or cold).
(assemble_start_function): Add code to make sure stuff is
written to
the correct section.
(assemble_zeros): Add in_unlikely_text_section as an 'or'
condition
to an if statement that was checking 'in_text_section'.
(assemble_variable): Add 'in_unlikely_text_section' as an 'or'
condition to an if statement that was checking
'in_text_section'.
(default_section_type_flags_1): Add check: if in cold section
flags = SECTION_CODE.
* config/rs6000/darwin.h (UNLIKELY_EXECUTED_TEXT_SECTION_NAME):
Change
text string to something more informative.
(SECTION_FORMAT_STRING): Add new definition.
* config/rs6000/rs6000.c (rs6000_assemble_integer): Add
'!in_unlikely_text_section' as an 'and' condition to an if
statement
that was already checking '!in_text_section'.
(output_cbranch): Modify 'need_longbranch' to be true if an
insn size
is LONG_COND_BRANCH_SIZE.
* config/rs6000/rs6000.h (LONG_COND_BRANCH_SIZE): Add new
definition.
* config/rs6000/sysv4.h (HOT_TEXT_SECTION_NAME,
UNLIKELY_EXECUTED_TEXT_SECTION_NAME,SECTION_FORMAT_STRING): Make
sure these are properly defined for linux on ppc.
* doc/invoke.texi (freorder-blocks-and-partition): Add
documentation
for this new flag.
* doc/tm.texi (SECTION_FORMAT_STRING, LONG_COND_BRANCH_SIZE):
Add
documentation for these new macros.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 10152 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20040128/ed941022/attachment.bin>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: gcc5-hot-cold2.txt
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20040128/ed941022/attachment.txt>
More information about the Gcc-patches
mailing list