29336 – shorten_branches and machine-dependent constant pool placement should be integrated

Bug 29336 - shorten_branches and machine-dependent constant pool placement should be integrated

Summary: shorten_branches and machine-dependent constant pool placement should be inte...

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	4.2.0

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	code-size, missed-optimization

Depends on:
Blocks:	29842 16996 38570
	Show dependency tree / graph

Reported:	2006-10-03 20:11 UTC by Jorn Wolfgang Rennecke
Modified:	2024-10-24 23:36 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:	sh-elf, arm-elf, arm-eabi
Build:
Known to work:
Known to fail:
Last reconfirmed:	2009-04-30 10:26:42

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jorn Wolfgang Rennecke 2006-10-03 20:11:17 UTC

See also the thread started with:
http://gcc.gnu.org/ml/gcc/2006-05/msg00304.html

shorten_branches and, where applicable branch splitting (like in the sh_reorg),
should be combined with machine-dependent constant pool placement
and small-scale hot/cold partitioning.  constant pool placement should take
account of branch probabilities.

I'll use neg_pool_range / pool_range in the following to describe the
ranges for constants as they are used in the arm machine description.

Place taken up by constants falls into two categories:
- per-constant space, which is easily calculable from each individual
  constant.  That space is known to be needed at most once per constant
  reference (it might be less if constants are shared)
- per-pool space.  Some constants need alignments, but the worst cost in any
  one pool from this requirement should be the maximum alignment minus the
  minimum size alignment.
  If a pool can't be placed after an existing barrier, a new one has to be
  artifically created, i.e. a jump around the pool needs to be insterted.
  If pools are placed such as to have the minimum number of pools,
  they must be spaced by at least the sum of the minumum neg_pool_range
  and the minimum pool_range of constants inside.
  If other considerations are used for constant pool placement, e.g.
  cost lengthened branches and their execution frequency, more, smaller,
  pools can result.  The worst case would be to have one pool for every
  constant reference, but no more than one per basic block plus number
  of ranges crossed.
  Some simple rules for minimum pool size can also help to put a more
  sensible upper bound for pool overhead.  I.e. you could say to put
  additional constants within range into a pool till the overhead is
  at most, say, 25%, or no more constants are in range.

While some improvements to the estimates can be made by observing when the
same constant loaded several times in short sucession, I wouldn't expect
this to be a significant factor, since cse would reduce the amount of such
loads.  Thus, for size estimates, we can pretend constants can't be commoned.

After doing one insn address calculating pass with optimistic assumptions,
i.e. assume all branches are short and no constant pools need to be placed,
for each constant, we can specify a maximum range of instructions within which
it has to be placed, starting at the address of the reference minus
neg_pool_range, and extending up to the address of the reference plus
pool_range.
Thus, to calculate an upper bound of the size of constant pools within
an offset range (e.g. branch site to branch destination), we take the
size of all constants that might have been placed up to the larger address,
subtract the size of all the constants that are known to have been placed
up to the lower address, and add the pool overhead.
Assuming we have adopted the 25% rule above, there is no need to keep track
of details of alignments and individual constant sizes; all we need is a
tally of the constant sizes.  We calculate the difference of the tallies,
add a quarter (truncated), and calculate a preliminary range; then we take
this range, divide it by the pool spacing PS (1) (rounded up), and multiply
it by the maximum pool overhead (alignment plus jump); this product is then
added to the preliminary range to get the actual upper bound for the range.
This calculation can be integrated into the align_fuzz comuptation.

 (1) PS is calculated here as minimum neg_pool_range plus minimum pool_range
 minus maximum pool overhead.

In order to integrate constants used for branches, each insn has to keep
three addresses: the overly optimistic one calculated at the start, which is
used to get bounds for the range in which a constant might be placed, the
one from the last relaxation pass, which is used for branch offset
calculations, and the one that is comupted at some time during the current
relaxation pass.  The latter two can be kept in arrays that are swapped
around to avoid unnecessary copying.  Or there might be an array of structs
describing each insn, and the struct contains an array with the addresses,
and which to use as current and previous changes with each pass.
A branch is considered needing a constant according to the offset calculation
using the addresses from the previous relaxation pass (that is no constant
if the offset can be encoded into the instruction).

For the previous relaxation pass (if any), there are two tally entries for
each insn: the total constant size known to have been placed so, and the
total constant size that might have been placed so far.
For the current relaxation pass, there are corresponding entries, but
they give the differential (first derivative) of these values; they area
zero-initialized at the start of the pass, and at the end consecutevely
summed up to give the total tallies which are referenced in the next pass
as the two tallies described before.
When an insn is encountered that references a constant (that might also
be a branch which its offset, AFAICS these are the only ones that are
not pass-invariant), the size of the constant is added to the two
differential tallies; for the one for values that might have been placed,
it is added at the (constant placement) address smaller then the current
one by neg_pool_offset, and for the one for constants that are known to be
placed, it is added at the address larger than the current one by pool_offset.
last-used-pointers to the various (neg_)pool_offsets can be kept to ensure
O(n) time complexity (with n being the number of instructions) for each pass.

A further compilation time optimization is possible by keeping the
differential values from the previous pass, and adjusting them only when
a branch changes offset requirements; thus, there is no need to scan
every insn in every pass.

Comment 1 Andrew Pinski 2006-10-03 20:30:20 UTC

Looks related to PR 9702 and maybe PR 9703