more hot-cold partitioning problems

Caroline Tice ctice@apple.com
Wed Apr 14 05:31:00 GMT 2004


On Apr 13, 2004, at 4:25 PM, Richard Henderson wrote:

> I must say, first of all, that I'm somewhat upset this patch got
> committed without further discussion.  While I admit that I dropped
> the ball in my review, I'm annoyed that whoever came along after
> (presumably geoffk) didn't require you to address the outstanding
> issues that I had raised in any way.
>

I am sorry that you did not have the chance to review the patch as much
as you wished before it got committed.  However I do not think that is 
my
fault.  There have been versions of this floating around for anyone to 
look
at or comment on since Oct. 8 last year.  My most recent patch (the one 
that
got committed) was offered for review on March 16.  It sat there 
without comment
for over two weeks.  Finally, on April 3, Zack Weinberg said that lots 
of concerns
had been raised and he wasn't sure they had all been addressed.  I 
responded
that I had done my best to address them all (I did), and could he (or 
anybody else)
point out what I had missed.  No response from anyone.  On April 6, 
Geoff
Keating approved the patch.  Bearing in mind what Zack had said, **I 
waited three
days to commit the patch**, in case somebody wanted to say "Wait a 
minute!  What
about blah?".  No one said anything, so on April 9 I committed the 
patch.

I honestly did my best to address all of the concerns that everybody 
raised as best
I could; I gave everybody ample opportunity to raise more objections or 
point out
concerns that I missed.  At this point I do not feel it is fair for you 
to complain too
loudly about the patch going in without all your concerns being 
addressed.


> I'm also quite annoyed that this patch was CLEARLY never tested
> on anything besides darwin.  Not even x86 linux.  I consider this
> sloppiness of the first order.
>

You are wrong here.  As I stated, I *did* test this patch on an x86 box 
running
Linux.  I bootstrapped it, I ran DejaGnu tests (neither of which were 
likely to
pick up problems, since they don't invoke partitioning), and I ran 6 
SPECInt
tests, using partitioning.

What I did *not* do (and in retrospect I can see that I should have) 
was to run
the SPEC tests with "-g".  I tend not to do that as it slows down 
performance, which
is what SPEC is usually about.  The errors you encountered with the 
labels at the
end of the function being in the wrong section are the result of using a
"-g" flag.  If you want any proof that I really did the testing, I 
submitted one or two
updates to the main patches  *because of problems I found when testing
on ix86 Linux*.  (see the update I submitted on February 27, for 
example).

I will work on fixing this problem as quickly as I can.  Until then I 
suggest
not using the "-g" flag when doing partitioning.


> However, what's done is done.  Let's now get to the business of
> fixing the outstanding problems:
>
>  (1) The biggest problem is that the end of the function is not
>      in the same section as the start of the function.  Which
>      means that the symbolic arithmetic associated with the .size
>      directive crosses sections, which is a hard assembler error.
>
>      That you never noticed this is how I'm absolutely certain
>      that you didn't test on any ELF-based target.


  I repeat, this seems to only be a problem if you compile with "-g".  
I'm
working on fixing it for non-stabs debugging formats.

>
>      Exactly how to resolve this, I'm not sure.  It's sure to
>      be fairly invasive.
>
> 	(a) We need to switch back to the section that was in
> 	    effect at the start of the function at the end.
> 	    This is easily done at the end of final().
>

I had already done this for stabs (and then forgot that I needed to do 
it
for other debugging formats).  I will certainly take care of this.

> 	(b) Every instance of text_section() in your patch is
> 	    wrong.  They should all be (afaics)
>
> 		function_section(current_function_decl)
>
> 	    This will choose ".text" or ".text.hot", since
> 	    anytime your patch is enabled, our previous per
> 	    function hot-path selection is also enabled.
>
I will look at this.


> 	(c) For functions which have been assigned a section by
> 	    the user, via __attribute__((section("foo"))), it
> 	    seems wrong to split the function at all.
>

I will look at this.

> 	(d) For functions which have been assigned to linkonce
> 	    sections, I'm not sure this optimization makes sense
> 	    as-is.  Certainly we could apply this, but we'd want
> 	    to not place all of the cold paths into .text.unlikely,
> 	    but rather into a linkonce section directly associated
> 	    with the one that the main body of the function resides.
>

I have to confess ignorance here.  I know nothing about linkonce
sections.  I could use some more detailed guidance here.


>  (2) How do you expect
>
>       else
>         {
>           in_section = in_unlikely_executed_text;
>           fprintf (asm_out_file, "%s\n", TEXT_SECTION_ASM_OP);
>         }
>
>      to work on systems that don't support named sections?  I mean,
>      I guess it doesn't *actively* fail, but it's definitely silly.
>      If you're not actually changing sections, then this does nothing.
>      I think you should simply disable partitioning if named sections
>      are not available.
>

I don't expect it to work.  Until this weekend I was unaware that there
were systems that didn't support named sections.  Since then I have
been working on  an update patch that  fixes this.  My approach is not
to disable the optimization entirely in the absence of named sections, 
but to
disable anything to do with actually writing separate sections.  This 
will leave
active the small bit that causes reorder_basic_blocks to put all the hot
blocks together before all the cold blocks.

>  (3) The unconditional branch fixup code that you wrote was never
>      tested.  It's got at least one trivial error that causes an ICE.
>      I can't tell if the code actually works at runtime, since I've
>      not yet been able to compile a whole application.
>
>      The patch for this problem follows.
>

Again, you are wrong, I tested it thoroughly on the two architectures I 
have access
to, and I have never encountered an ICE with it.   However I will be 
happy to incorporate your
suggested change into my update patch.

>  (4) You left off leading dots for HOT_TEXT_SECTION_NAME and
>      UNLIKELY_EXECUTED_TEXT_SECTION_NAME as is the norm for extant
>      systems.  This means that the sections won't be sorted properly
>      for existing versions of the gnu linker for all targets.
>      If you really need to dot-less version for darwin, you'll need
>      to redefine these macros there.
>

I object here!!  I DID NOT define those values;  they were already 
defined in
the system without the dots long before my patch was created!  As a 
matter
of fact, trying to deal with the absence of dots and not changing the 
existing
definitions had a lot to do with why I originally created the 
SECTION_FORMAT_STRING
(which got removed over the weekend).

>  (5) The alignment thing that I told you in no uncertain terms was
>      wrong, and was not-quite-but-much-closer fixed by pinskia,
>      is fixed for real below.  Note that "2" was a completely
>      meaningless target-specific value.
>

Thank you.  I will incorporate this into my update patch.


>  (6) Repeat after me: "Instruction stream searching is wrong, m'kay?"
>

Could you explain why?

>      The NOTE_INSN_UNLIKELY_EXECUTED_CODE note (or a suitably renamed
>      equivalent) should contain all of the information you need to
>      switch between sections.  You should not have to search forward
>      from NOTE_INSN_BASIC_BLOCK to discover if a subsequent note
>      exists.
>

The only way I could do this would be to make the
NOTE_INSN_UNLIKELY_EXECUTED_CODE note be the very first thing in any
basic block to which it belongs.  I believe this violates assumptions 
made
throughout the compiler that the first thing must either be a code label
or a NOTE_INSN_BASIC_BLOCK note.   I may be mistaken about that though,
I would have to double check...


>      Plus, as I noted in my previous review, triggering a section
>      change on NOTE_INSN_BASIC_BLOCK means that the label that came
>      before the NOTE_INSN_BASIC_BLOCK note would have been emitted
>      to the wrong section.
>

No, this does not happen.  Have you actually looked at the
assembly code generated and verified that this is happening for you?  In
all the partitioned assembly code I have examined, the labels are coming
out in the correct sections.

If you examine the patch I committed, the code in final_scan_insn,
where dealing with a label instruction *also* scans ahead for the
NOTE_INSN_UNLIKELY_EXECUTED_CODE
note, and makes sure to change to the correct section before outputting 
the label.
(lines 1918-1931 in final.c).


>      That you never encountered a problem with this suggests that
>      all cold blocks are always clustered at the end of the function
>      (which is not unexpected from the way things are constructed),
>      which means that you should be able to assert that this property
>      is always true, which means that you should be able to arrange
>      for there to be exactly one NOTE_INSN_UNLIKELY_EXECUTED_CODE
>      emitted, and thus maximally one call to unlikely_text_section.
>      Which would simplify logic all over the place.
>

No, I never encountered the problem because I explicitly fixed the
code (long before you ever mentioned this) to make sure the labels
came out in the correct section.

With regards to the clustering of hot/cold blocks,  I have found (on
examining the assembly files from the SPEC tests) that
*most of the time* all the hot blocks come before all the cold blocks.
But this is not true all the time.

>  (7) Dwarf2 debugging doesn't assemble, at all.  Again, this is a
>      cross-section symbolic arithmetic problem.  Someone is going
>      to have to either implement support for cross-section debug info
>      (which shouldn't be *that* hard), or arrange for debug info to
>      be suppressed for code in the .text.unlikely section.
>
>      I suspect that there are similar problems for other non-stabs
>      debug formats.
>
>

I will take a look at this.

> r~
>
>
>
> 	* bb-reorder.c (fix_crossing_unconditional_branches): Use Pmode
> 	for LABEL_REFs.
> 	* doc/invoke.texi: Update to match.
>
> Index: bb-reorder.c
> ===================================================================
> RCS file: /cvs/gcc/gcc/gcc/bb-reorder.c,v
> retrieving revision 1.68
> diff -c -p -d -r1.68 bb-reorder.c
> *** bb-reorder.c	11 Apr 2004 08:20:42 -0000	1.68
> --- bb-reorder.c	13 Apr 2004 22:34:31 -0000
> *************** fix_crossing_unconditional_branches (voi
> *** 1771,1777 ****
>   		 reference of label, as target for jump.  */
>   	
>   	      label = JUMP_LABEL (last_insn);
> ! 	      label_addr = gen_rtx_LABEL_REF (VOIDmode, label);
>   	      LABEL_NUSES (label) += 1;
>   	
>   	      /* Get a register to use for the indirect jump.  */
> --- 1771,1777 ----
>   		 reference of label, as target for jump.  */
>   	
>   	      label = JUMP_LABEL (last_insn);
> ! 	      label_addr = gen_rtx_LABEL_REF (Pmode, label);
>   	      LABEL_NUSES (label) += 1;
>   	
>   	      /* Get a register to use for the indirect jump.  */
>
>
>
> 	* defaults.h (HOT_TEXT_SECTION_NAME): Add leading dot.
> 	(UNLIKELY_EXECUTED_TEXT_SECTION_NAME): Likewise.
> 	* doc/
>
> Index: defaults.h
> ===================================================================
> RCS file: /cvs/gcc/gcc/gcc/defaults.h,v
> retrieving revision 1.135
> diff -c -p -d -r1.135 defaults.h
> *** defaults.h	11 Apr 2004 06:21:03 -0000	1.135
> --- defaults.h	13 Apr 2004 22:34:31 -0000
> *************** You Lose!  You must define PREFERRED_DEB
> *** 620,626 ****
>   #endif
>
>   #ifndef HOT_TEXT_SECTION_NAME
> ! #define HOT_TEXT_SECTION_NAME "text.hot"
>   #endif
>
>   #ifndef NORMAL_TEXT_SECTION_NAME
> --- 620,626 ----
>   #endif
>
>   #ifndef HOT_TEXT_SECTION_NAME
> ! #define HOT_TEXT_SECTION_NAME ".text.hot"
>   #endif
>
>   #ifndef NORMAL_TEXT_SECTION_NAME
> *************** You Lose!  You must define PREFERRED_DEB
> *** 628,634 ****
>   #endif
>
>   #ifndef UNLIKELY_EXECUTED_TEXT_SECTION_NAME
> ! #define UNLIKELY_EXECUTED_TEXT_SECTION_NAME "text.unlikely"
>   #endif
>
>   #ifndef HAS_LONG_COND_BRANCH
> --- 628,634 ----
>   #endif
>
>   #ifndef UNLIKELY_EXECUTED_TEXT_SECTION_NAME
> ! #define UNLIKELY_EXECUTED_TEXT_SECTION_NAME ".text.unlikely"
>   #endif
>
>   #ifndef HAS_LONG_COND_BRANCH
> Index: doc/invoke.texi
> ===================================================================
> RCS file: /cvs/gcc/gcc/gcc/doc/invoke.texi,v
> retrieving revision 1.443
> diff -c -p -d -r1.443 invoke.texi
> *** doc/invoke.texi	9 Apr 2004 19:57:47 -0000	1.443
> --- doc/invoke.texi	13 Apr 2004 22:34:32 -0000
> *************** paging and cache locality performance.
> *** 4217,4224 ****
>   @opindex freorder-functions
>   Reorder basic blocks in the compiled function in order to reduce 
> number of
>   taken branches and improve code locality. This is implemented by 
> using special
> ! subsections @code{text.hot} for most frequently executed functions 
> and
> ! @code{text.unlikely} for unlikely executed functions.  Reordering is 
> done by
>   the linker so object file format must support named sections and 
> linker must
>   place them in a reasonable way.
>
> --- 4217,4224 ----
>   @opindex freorder-functions
>   Reorder basic blocks in the compiled function in order to reduce 
> number of
>   taken branches and improve code locality. This is implemented by 
> using special
> ! subsections @code{.text.hot} for most frequently executed functions 
> and
> ! @code{.text.unlikely} for unlikely executed functions.  Reordering 
> is done by
>   the linker so object file format must support named sections and 
> linker must
>   place them in a reasonable way.
>
>
>
>
>
> 	* varasm.c (unlikely_text_section): Use assemble_align instead of
> 	ASM_OUTPUT_ALIGN.  Use it in the correct place with an approximately
> 	correct alignment argument.
>
> Index: varasm.c
> ===================================================================
> RCS file: /cvs/gcc/gcc/gcc/varasm.c,v
> retrieving revision 1.420
> diff -c -p -d -r1.420 varasm.c
> *** varasm.c	11 Apr 2004 06:21:04 -0000	1.420
> --- varasm.c	13 Apr 2004 22:34:32 -0000
> *************** unlikely_text_section (void)
> *** 228,234 ****
>   	{
>   	  in_section = in_unlikely_executed_text;
>   	  fprintf (asm_out_file, "%s\n", TEXT_SECTION_ASM_OP);
> - 	  ASM_OUTPUT_ALIGN (asm_out_file, 2);
>   	}
>
>         if (!unlikely_section_label_printed)
> --- 228,233 ----
> *************** unlikely_text_section (void)
> *** 236,241 ****
> --- 235,244 ----
>   	  fprintf (asm_out_file, "__%s_unlikely_section:\n",
>   		   current_function_name ());
>   	  unlikely_section_label_printed = true;
> +
> + 	  /* Make sure that we have approprate alignment for instructions
> + 	     in this section.  */
> + 	  assemble_align (FUNCTION_BOUNDARY);
>   	}
>       }
>   }
>



More information about the Gcc-patches mailing list