PATCH: New Optimization: Partitioning hot & cold basic blocks

Caroline Tice ctice@apple.com
Wed Jan 28 23:03:00 GMT 2004


Okay, here we go again.  Last October I submitted this patch for a new 
optimization that partitions
hot and cold basic blocks into separate sections of the .o file, based 
on feedback information and
the existing bb-reorder stuff.  I received many suggestions for 
improvements and queries about
performance, etc.  I have incorporated all the suggestions into the 
patch, and updated it to work with
the current 3.5 mainline gcc.  Below is a recap of my answers to the 
questions (and my original post).

There was also a question raised about whether or not there was an 
existing patent for this method
which my patch violated.  I have since located the patent in question 
and after reading it through
carefully I have come to the conclusion that my patch does not violate 
the patent. So once again
I am asking permission to commit this new optimization to the 3.5 
mainline compiler.

Okay to commit?

-- Caroline Tice
ctice@apple.com

Testing Information:
----------------------------

	HW, OS:  - Apple G4 (Dual 1.25 GHz PowerPC), running Mac OS X (Jaguar, 
Panther)
                           - Apple G5 (2.0 GHz 970), running Max OS X 
(Panther)
	                 - Apple G4, running PPC Linux
		       -  Pentium 4, running linux

	Tests Run (and passed):
               - test case specifically designed to test the hot/cold 
partitioning. Verified that it compiled
                 and ran correctly and did the partitioning.
	    - SPECInt 2000 test cases that FSF gcc 3.5 passes
              - Bootstrapping
             -  DejaGnu test suite


Performance Improvement Results:
------------------------------------------------

In my unofficial SPEC runs on my machine,  I get an improvement in the 
SPEC ratio
of anywhere from 0.2% to 1.7% for gzip, mcf, parser, and twolf.   vpr 
got worse by 1.2%.
The compile time differences were negligible.

Binary File Size Changes:
------------------------------------

The binary sizes show a size increase ranging from 3% to 7%.  Out of my 
five test cases,
the largest percentage increase I saw was in parser, which increased 
from 203,256 blocks to
218,408 blocks (up by 7.5%).


Other questions/comments/issues raised:
--------------------------------------------------------

 From Zdenek Dvorak:

> 2) What are the differences between results using profile feedback and
>    static estimates?

Using this new optimization with  profile feedback showed an 
improvement over using
the new optimization with static estimates in all cases but one (for 
mcf the two cases were
identical).  In the cases where feedback gave an improvement, the 
improvements ranged
from 0.2% (vpr) to 2.8% (gzip).

> 4) If possible, some oprofile data, especially impact on instruction
>    cache and branch prediction.

I don't have this information at this time.


 From Jan Hubicka

>
> This is something I was thinking about for a while.  How do you deal
> with debug info and unwind tables?

1).  We don't attempt to perform this optimization in the presence of 
exception handling at
the moment.  (I put in an explicit test for this).

2).  I have not attempted to ensure that all the debug information is 
correct.  I *have*
added, at the beginning of the cold section for each function, a global 
unique label/tag
of the form "_<function_name>_unexecuted_section", which helps gdb tell 
the user more
accurately where the user is when debugging.  I have run gdb on an 
executable that has
been partitioned into hot and cold sections.  I remember being able to 
do my debugging, but
I wasn't looking closely at the debugging information, and I can't 
remember what particularly
did or did not work well.


2004-01-28  Caroline Tice  <ctice@apple.com>

         * basic-block.h (struct basic_block):   Add new field,
         "section_boundary" .
         (partition_hot_cold_basic_blocks): Add extern function 
declaration.
         * bb-reorder.c (function.h, obstack.h):  Add two new include
         statements.
         (find_rarely_executed_basic_blocks): New function.
         (mark_bb_for_unlikely_executed_section): New function.
         (color_basic_blocks): New function.
         (find_all_crossing_edges): New function.
         (add_labels_and_missing_jumps): New function.
         (add_section_boundary_info): New function.
         (fix_up_fall_thru_edges): New function.
         (fix_edges_for_rarely_executed_code): New function.
         (insn_on_section_boundary):  New function.
         (partition_hot_cold_basic_blocks): New function.
         * cfg.c (struct basic_block_def entry_exit_blocks): Add
         initialization value for new "section_boundary" field.
         * cfglayout.c (update_unlikely_executed_notes):  New function.
         (fixup_reorder_chain):  Add code so when a new jumping basic 
block is
         added, it's "section_boundary" and UNLIKELY_EXECUTED_CODE note 
are
         updated appropriately.
         (duplicate_insn_chain):  Add code to duplicate the new NOTE insn
         introduced by this optimization.
         * cfglayout.h (scan_ahead_for_unlikely_executed_note):  Add new
         extern function declaration.
         * common.opt (freorder-blocks-and-partition):  Add new flag for 
this
         optimization.
         * dbxout.c (dbx_function_end):  Add code to make sure scope 
labels at
         the end of functions are written into the correct (hot or cold)
         section.
         (dbx_source_file): Add code to so writing debug file information
         doesn't incorrectly change sections.
         * defaults.h (HOT_TEXT_SECTION_NAME): Modify value to work for
         linux/i386.
         (UNLIKELY_EXECUTED_TEXT_SECTION_NAME): Modify value to work for
         linux/i386.
         (SECTION_FORMAT_STRING): New macro, for linux/i386 hot/cold 
section
         partitioning.
         * final.c (shorten_branches): Add code (in an #ifdef) for
         architectures with short conditional branches to mark them for
         modification, to span distance between hot & cold sections.
         (scan_ahead_for_unlikely_executed_note):  New function.
         (is_jump_table_basic_block):  New function.
         (final_scan_insn):  Add code to check for NOTE instruction 
indicating
         whether basic block belongs in hot or cold section, and to make 
sure
         the current basic block is being written to the appropriate 
section.
         Also added code to ensure that jump table basic blocks end up 
in the
         correct section.
         * flags.h (flag_reorder_blocks_and_partition):  New flag.
         * opts.c (decode_options): Code to handle new flag,
         flag_reorder_blocks_and_partition.
         (common_handle_option): Code to handle new flag,
         flag_reorder_blocks_and_partition.
         * output.h (unlikely_text_section): New extern function 
declaration.
         (in_unlikely_text_section): New extern function declaration.
         * print-rtl.c (print_rtx): Add code for handling new note,
         NOTE_INSN_UNLIKELY_EXECUTED_CODE
         * rtl.c  (NOTE_INSN_UNLIKELY_EXECUTED_CODE): New note (see 
below).
         * rtl.h (NOTE_INSN_UNLIKELY_EXECUTED_CODE):  New note 
instruction,
         indicating the basic block containing it belongs in the cold 
section.
         (insn_on_section_boundary) : New extern function declaration.
         * toplev.c (flag_reorder_blocks_and_partition):  Add code to
         initialize this flag, and to tie it to the command-line option
         freorder-blocks-and-partition.
         (rest_of_handle_stack_regs):  Add 
flag_reorder_blocks_and_partition
         as an 'or' condition for calling reorder_basic_blocks.
         (rest_of_handle_reorder_blocks):  Add
         flag_reorder_blocks_and_partition as an 'or' condition for 
calling
         reorder_basic_blocks.
         (rest_of_compilation):  Add call to 
partition_hot_cold_basic_blocks.
         * varasm.c (cfglayout.h):  Add new include statement.
         (unlikely_section_label_printed):  New global variable, used for
         determining when to output section name labels for cold 
sections.
         (in_section):  Add in_unlikely_executed_text to enum data 
structure.
         (text_section):  Modify code to use SECTION_FORMAT_STRING and
         HOT_TEXT_SECTION_NAME macros.
         (unlikely_text_section):  New function.
         (in_unlikely_text_section):  New function.
         (function_section):  Add code to make sure beginning of 
function is
         written into correct section (hot or cold).
         (assemble_start_function):  Add code to make sure stuff is 
written to
         the correct section.
         (assemble_zeros):  Add in_unlikely_text_section as an 'or' 
condition
         to an if statement that was checking 'in_text_section'.
         (assemble_variable):  Add 'in_unlikely_text_section' as an 'or'
         condition to an if statement that was checking 
'in_text_section'.
         (default_section_type_flags_1):  Add check: if in cold section
         flags = SECTION_CODE.
         * config/rs6000/darwin.h (UNLIKELY_EXECUTED_TEXT_SECTION_NAME): 
Change
         text string to something more informative.
         (SECTION_FORMAT_STRING):  Add new definition.
         * config/rs6000/rs6000.c (rs6000_assemble_integer):  Add
         '!in_unlikely_text_section' as an 'and' condition to an if 
statement
         that was already checking '!in_text_section'.
         (output_cbranch): Modify 'need_longbranch' to be true if an 
insn size
         is LONG_COND_BRANCH_SIZE.
         * config/rs6000/rs6000.h (LONG_COND_BRANCH_SIZE): Add new 
definition.
         * config/rs6000/sysv4.h (HOT_TEXT_SECTION_NAME,
         UNLIKELY_EXECUTED_TEXT_SECTION_NAME,SECTION_FORMAT_STRING): Make
         sure these are properly defined for linux on ppc.
         * doc/invoke.texi  (freorder-blocks-and-partition): Add 
documentation
         for this new flag.
         * doc/tm.texi (SECTION_FORMAT_STRING, LONG_COND_BRANCH_SIZE): 
Add
         documentation for these new macros.



-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 10152 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20040128/ed941022/attachment.bin>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: gcc5-hot-cold2.txt
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20040128/ed941022/attachment.txt>


More information about the Gcc-patches mailing list