This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Fwd: [PATCH] Hot/cold partitioning fixes



This is the second of three messages.


Begin forwarded message:

From: Caroline M Tice <ctice@apple.com>
Date: Sat Mar 12, 2005  9:24:08 PM US/Pacific
To: Zack Weinberg <zack@codesourcery.com>
Cc: Caroline Tice <ctice@apple.com>
Subject: Re: [PATCH] Hot/cold partitioning fixes


On Saturday, March 12, 2005, at 03:10 PM, Zack Weinberg wrote:



My most important concern: Could you talk a little about why you now impose the constraint that all the 'hot' blocks come before all the 'cold' blocks? The need is not obvious to me, and it seems to cause a substantial number of complications (esp. in bb-reorder.c).


I'm not sure whether your question is: 1). Why do I want all the hot blocks together and all the cold blocks together, with a single transition between them? or 2). Assuming we are agreed that we want only a single transition between the group of hot blocks and the group of cold blocks, why do I always try to make the hot blocks come first?

The answer to question 1 is: It allows me to remove the scanning ahead
in the instruction stream for determining when to switch between hot
and cold sections, and also seems necessary for making partitioning
work with dwarf debugging output.  In the current implementation (the
one committed in the FSF branch) there could be multiple transitions
between hot and cold sections in a function, as I don't absolutely
force all the hot blocks together and all the cold blocks together.
This in turn means that in final_scan_insn (where the assembly code,
and consequently the section switching directives, are written out) I
need to check, at the start of EACH basic block, to see which section
the block belongs to (which involves scanning ahead two or three
instructions), then see if we are already in the appropriate section
or not, and switch if we're not in the correct section.  Richard
Henderson complained vociferously about this and definitely wanted me
to eliminate the scanning ahead in the instruction stream.  This is
ALSO what caused you the problems a few months back, when you had to
modify function_section, because it depended on this scanning ahead to
determine the correct section, and that bit you.  By absolutely
forcing all the hot blocks together and all the cold blocks together,
and thus ensuring there will be at most one transition point in the
rtl/insn stream, I can remove all the forward scanning, and when I hit
the single transition note, change to the cold section.

In addition to allowing me to eliminate the forward scanning, I
discovered when I was modifying the dwarf2 stuff, that having at most
one transition point in the assembly code is really necessary to be
able to output any kind of meaningful dwarf stuff.  In particular
dwarf requires putting out some "deltas" in the debug information,
which are calculated by subtracting a label inserted at the beginning
of a section from a label inserted at the end of a section.  If there
is at most one transition point, then I can just add an extra label
marking the end of the hot section, and extra label marking the
beginning of the cold section, update the delta functions to use the
appropriate new labels, and it works (this is in fact what I have
done).  If there were multiple transition points within a function,
the whole dwarf business would become practically unsolvable.

So, that's my answer to question 1.

If you were asking question 2 (why hot-cold and not cold-hot): There
is no absolute reason why all the hot blocks have to come first
rather than making all the cold blocks come first; however a lot of
the back end of the compiler seems to have tiny assumptions that the
(first real) block pointed to by ENTRY_BLOCK_POINTER will actually be
the first code written to the assembly file.  Breaking this assumption
breaks lots of code all over the back end.  Since MOST of the time
that block is hot, it made sense to me to just always put the hot
blocks first.

Anyway those were my reasons.  If they seem inadequate or mistaken to
you, let me know.  Or if my explanations are unclear in any way.

I would also like to point out that this construct

+ 	  {
+ 	    len = strlen (UNLIKELY_EXECUTED_TEXT_SECTION_NAME);
+ 	    unlikely_text_section_name = xmalloc (len+1 * sizeof (char));
+ 	    strcpy (unlikely_text_section_name,
+ 		    UNLIKELY_EXECUTED_TEXT_SECTION_NAME);
+ 	  }

would be better spelled

          unlikely_text_section_name
            = xstrdup (UNLIKELY_EXECUTED_TEXT_SECTION_NAME);


Yes, that is definitely shorter/more compact. I was unaware of this particular idiom, but will be happy to use it.

Furthermore, never write "sizeof(char)" -- it is 1 by definition.


Is this absolutely guaranteed by the language definition, or is it just standard practice? The reason I ask is that I know that on some platforms, bool is defined to be 4 bytes instead of 1 byte, which is what *I* would have thought it always was, no matter what. Having been mistaken once tends to make me cautious about making such assumptions...

I will provide more detailed comments in a day or two.


Thank you very much! I truly appreciate your looking at my patch so promptly, and will be happy to do anything in my power to make this easier for you.

-- Caroline Tice
ctice@apple.com



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]