This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

using scratchpads to enhance RTL-level if-conversion: current scratchpad allocation algorithm, alternatives


[Abe wrote:]

+ if (size_of_MEM_operand > size_of_biggest_scratchpad)
+ {
+ size_of_biggest_scratchpad = size_of_MEM_operand;
+ biggest_scratchpad_rtx = assign_stack_local
+ (GET_MODE (orig_x), size_of_MEM_operand, 0);
+ /* 0: align acc. to the machine mode indicated by
+ "GET_MODE (orig_x)" */
+ gcc_assert (biggest_scratchpad_rtx);
+ scratchpads.safe_push (std::make_pair (biggest_scratchpad_rtx,
+ size_of_MEM_operand));
+ }

[Bernd Schmidt wrote:]

It looks like you're allocating extra stack slots,
and later code can decide not to use them.


Yes, sir.  That is correct.  The current allocation algorithm is what I would call "optimistic":
it first allocates a slot of the size it currently needs, and then gets a new one --
ignoring all the old ones from that point on -- if-and-when it needs a bigger one.
That probably causes a small amount of waste in the average case, but can be forced to waste a
larger amount of space by adversarial code that is written with knowledge of the allocation algorithm.

The simplest relevant algorithm of which I know is to find the largest supported scalar type in the current
target and to use that size straight away upon allocating the scratchpad the first time, and then never
allocating it again for the function being compiled. ["option 2" in the below.]  I think this is likely to be
wasteful of stack space in common cases, e.g. a 64-byte scratchpad being allocated when a 4-byte one would
have sufficed.  I am concerned that excessive stack-space waste could cause problems with recursive code.


The following are the scratchpad allocation algorithm options of which I am currently aware.


[my term: "RUC": "Routine Under Compilation", by analogy to "UUT" for "Unit Under Test"]


[option 1: already implemented]
  Create a scratchpad of currently-needed size if no scratchpad yet in this RUC or the biggest one is not big enough

[option 2]
  Searching the machine definition and allocating the largest-useful size as the size of the first-&-only per-RUC scratchpad

[option 3]
  Use the stack-alignment value for the size: probably available as "MAX_SUPPORTED_STACK_ALIGNMENT"

[option 4]
  Search the RUC for scratchpad opportunities _before_ doing the transformations, find the
  biggest-useful-to-this-RUC scratchpad size, use _that_ as the size of the first-&-only per-RUC scratchpad

[option 5]
  As with option 1, but round up the size to an integer multiple of some granularity,
  e.g. 4 bytes / 8 bytes / {stack-alignment value} bytes.  However, this may
  [at least sometimes] eliminate the opportunity to fit the scratchpad into an unused piece of
  the stack frame which is otherwise wasted due to the alignment needs of another stack slot.
  The current algo., i.e. option 1, should allow GCC to take maximal advantage of such "slack"
  in the stack frame when choosing where to place the scratchpad in question.


Option 4 is probably the hardest-by-far to implement, but would result in the least run-time stack overhead from scratchpads.

In a best-case scenario, my optimistic algo. [option 1] results in zero expansion of the stack frame.

Regards,

Abe


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]