This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

more hot-cold partitioning problems


I must say, first of all, that I'm somewhat upset this patch got
committed without further discussion.  While I admit that I dropped
the ball in my review, I'm annoyed that whoever came along after
(presumably geoffk) didn't require you to address the outstanding
issues that I had raised in any way.

I'm also quite annoyed that this patch was CLEARLY never tested
on anything besides darwin.  Not even x86 linux.  I consider this
sloppiness of the first order.

However, what's done is done.  Let's now get to the business of
fixing the outstanding problems:

 (1) The biggest problem is that the end of the function is not
     in the same section as the start of the function.  Which
     means that the symbolic arithmetic associated with the .size
     directive crosses sections, which is a hard assembler error.

     That you never noticed this is how I'm absolutely certain
     that you didn't test on any ELF-based target.

     Exactly how to resolve this, I'm not sure.  It's sure to
     be fairly invasive.  

	(a) We need to switch back to the section that was in
	    effect at the start of the function at the end.
	    This is easily done at the end of final().

	(b) Every instance of text_section() in your patch is
	    wrong.  They should all be (afaics)

		function_section(current_function_decl)

	    This will choose ".text" or ".text.hot", since
	    anytime your patch is enabled, our previous per
	    function hot-path selection is also enabled.

	(c) For functions which have been assigned a section by
	    the user, via __attribute__((section("foo"))), it
	    seems wrong to split the function at all.

	(d) For functions which have been assigned to linkonce
	    sections, I'm not sure this optimization makes sense
	    as-is.  Certainly we could apply this, but we'd want
	    to not place all of the cold paths into .text.unlikely,
	    but rather into a linkonce section directly associated
	    with the one that the main body of the function resides.

 (2) How do you expect

      else
        {
          in_section = in_unlikely_executed_text;
          fprintf (asm_out_file, "%s\n", TEXT_SECTION_ASM_OP);
        }

     to work on systems that don't support named sections?  I mean,
     I guess it doesn't *actively* fail, but it's definitely silly.
     If you're not actually changing sections, then this does nothing.
     I think you should simply disable partitioning if named sections
     are not available.

 (3) The unconditional branch fixup code that you wrote was never
     tested.  It's got at least one trivial error that causes an ICE.
     I can't tell if the code actually works at runtime, since I've
     not yet been able to compile a whole application.

     The patch for this problem follows.

 (4) You left off leading dots for HOT_TEXT_SECTION_NAME and
     UNLIKELY_EXECUTED_TEXT_SECTION_NAME as is the norm for extant
     systems.  This means that the sections won't be sorted properly
     for existing versions of the gnu linker for all targets.

     If you really need to dot-less version for darwin, you'll need
     to redefine these macros there.

 (5) The alignment thing that I told you in no uncertain terms was
     wrong, and was not-quite-but-much-closer fixed by pinskia, 
     is fixed for real below.  Note that "2" was a completely 
     meaningless target-specific value.

 (6) Repeat after me: "Instruction stream searching is wrong, m'kay?"

     The NOTE_INSN_UNLIKELY_EXECUTED_CODE note (or a suitably renamed
     equivalent) should contain all of the information you need to
     switch between sections.  You should not have to search forward
     from NOTE_INSN_BASIC_BLOCK to discover if a subsequent note
     exists.

     Plus, as I noted in my previous review, triggering a section
     change on NOTE_INSN_BASIC_BLOCK means that the label that came
     before the NOTE_INSN_BASIC_BLOCK note would have been emitted
     to the wrong section.

     That you never encountered a problem with this suggests that
     all cold blocks are always clustered at the end of the function
     (which is not unexpected from the way things are constructed),
     which means that you should be able to assert that this property
     is always true, which means that you should be able to arrange
     for there to be exactly one NOTE_INSN_UNLIKELY_EXECUTED_CODE
     emitted, and thus maximally one call to unlikely_text_section.
     Which would simplify logic all over the place.

 (7) Dwarf2 debugging doesn't assemble, at all.  Again, this is a
     cross-section symbolic arithmetic problem.  Someone is going
     to have to either implement support for cross-section debug info
     (which shouldn't be *that* hard), or arrange for debug info to
     be suppressed for code in the .text.unlikely section.

     I suspect that there are similar problems for other non-stabs
     debug formats.


r~



	* bb-reorder.c (fix_crossing_unconditional_branches): Use Pmode
	for LABEL_REFs.
	* doc/invoke.texi: Update to match.

Index: bb-reorder.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/bb-reorder.c,v
retrieving revision 1.68
diff -c -p -d -r1.68 bb-reorder.c
*** bb-reorder.c	11 Apr 2004 08:20:42 -0000	1.68
--- bb-reorder.c	13 Apr 2004 22:34:31 -0000
*************** fix_crossing_unconditional_branches (voi
*** 1771,1777 ****
  		 reference of label, as target for jump.  */
  	      
  	      label = JUMP_LABEL (last_insn);
! 	      label_addr = gen_rtx_LABEL_REF (VOIDmode, label);
  	      LABEL_NUSES (label) += 1;
  	      
  	      /* Get a register to use for the indirect jump.  */
--- 1771,1777 ----
  		 reference of label, as target for jump.  */
  	      
  	      label = JUMP_LABEL (last_insn);
! 	      label_addr = gen_rtx_LABEL_REF (Pmode, label);
  	      LABEL_NUSES (label) += 1;
  	      
  	      /* Get a register to use for the indirect jump.  */



	* defaults.h (HOT_TEXT_SECTION_NAME): Add leading dot.
	(UNLIKELY_EXECUTED_TEXT_SECTION_NAME): Likewise.
	* doc/

Index: defaults.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/defaults.h,v
retrieving revision 1.135
diff -c -p -d -r1.135 defaults.h
*** defaults.h	11 Apr 2004 06:21:03 -0000	1.135
--- defaults.h	13 Apr 2004 22:34:31 -0000
*************** You Lose!  You must define PREFERRED_DEB
*** 620,626 ****
  #endif
  
  #ifndef HOT_TEXT_SECTION_NAME
! #define HOT_TEXT_SECTION_NAME "text.hot"
  #endif
  
  #ifndef NORMAL_TEXT_SECTION_NAME
--- 620,626 ----
  #endif
  
  #ifndef HOT_TEXT_SECTION_NAME
! #define HOT_TEXT_SECTION_NAME ".text.hot"
  #endif
  
  #ifndef NORMAL_TEXT_SECTION_NAME
*************** You Lose!  You must define PREFERRED_DEB
*** 628,634 ****
  #endif
  
  #ifndef UNLIKELY_EXECUTED_TEXT_SECTION_NAME
! #define UNLIKELY_EXECUTED_TEXT_SECTION_NAME "text.unlikely"
  #endif
  
  #ifndef HAS_LONG_COND_BRANCH
--- 628,634 ----
  #endif
  
  #ifndef UNLIKELY_EXECUTED_TEXT_SECTION_NAME
! #define UNLIKELY_EXECUTED_TEXT_SECTION_NAME ".text.unlikely"
  #endif
  
  #ifndef HAS_LONG_COND_BRANCH
Index: doc/invoke.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/invoke.texi,v
retrieving revision 1.443
diff -c -p -d -r1.443 invoke.texi
*** doc/invoke.texi	9 Apr 2004 19:57:47 -0000	1.443
--- doc/invoke.texi	13 Apr 2004 22:34:32 -0000
*************** paging and cache locality performance.
*** 4217,4224 ****
  @opindex freorder-functions
  Reorder basic blocks in the compiled function in order to reduce number of
  taken branches and improve code locality. This is implemented by using special
! subsections @code{text.hot} for most frequently executed functions and
! @code{text.unlikely} for unlikely executed functions.  Reordering is done by
  the linker so object file format must support named sections and linker must
  place them in a reasonable way.
  
--- 4217,4224 ----
  @opindex freorder-functions
  Reorder basic blocks in the compiled function in order to reduce number of
  taken branches and improve code locality. This is implemented by using special
! subsections @code{.text.hot} for most frequently executed functions and
! @code{.text.unlikely} for unlikely executed functions.  Reordering is done by
  the linker so object file format must support named sections and linker must
  place them in a reasonable way.
  




	* varasm.c (unlikely_text_section): Use assemble_align instead of
	ASM_OUTPUT_ALIGN.  Use it in the correct place with an approximately
	correct alignment argument.

Index: varasm.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/varasm.c,v
retrieving revision 1.420
diff -c -p -d -r1.420 varasm.c
*** varasm.c	11 Apr 2004 06:21:04 -0000	1.420
--- varasm.c	13 Apr 2004 22:34:32 -0000
*************** unlikely_text_section (void)
*** 228,234 ****
  	{
  	  in_section = in_unlikely_executed_text;
  	  fprintf (asm_out_file, "%s\n", TEXT_SECTION_ASM_OP);
- 	  ASM_OUTPUT_ALIGN (asm_out_file, 2);
  	}
        
        if (!unlikely_section_label_printed)
--- 228,233 ----
*************** unlikely_text_section (void)
*** 236,241 ****
--- 235,244 ----
  	  fprintf (asm_out_file, "__%s_unlikely_section:\n", 
  		   current_function_name ());
  	  unlikely_section_label_printed = true;
+ 
+ 	  /* Make sure that we have approprate alignment for instructions
+ 	     in this section.  */
+ 	  assemble_align (FUNCTION_BOUNDARY);
  	}
      }
  }


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]