This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFA: reload infrastructure to fix PR target/21623

Kaz Kojima wrote:

Joern RENNECKE <> wrote:

Yes. This patch is OK for mainline and any open release branches, as it makes
cost calculations more accurate. However, the PR should stay open till we can
express the required reloads.


Could you give me a hint about how the cost should be estimated?
I'm not able to find pointers for it except a brief description
of REGISTER_MOVE_COST in doc/tm.texi.

There are a lot of heuristics and estimates involved, so slightly different numbers
can be equally justifyable. That being said, I'll have a go at it:

In general, you start with the sum of the costs of the individual instructions involved.
movt rn / lds rn,fpul / fsts fpul.fr12 are three instructions with a cost of two each,
giving a total cost of 6. For SH2e and SH3e, these three instructions are issued
in three cycles, which is as fast as any three instructions can be executed.
For the SH4, it gets more complex. At -Os, we consider sizes foremost, so
we got the size of three basic instructions, again giving us a cost of 6. When
optimizing for speed, however, we have to consider that register-register moves
among general purpose registers are group MT and can dual-issue with anything.
The groups of the three insn in question are EX / LS / LS, and the latencies 1 / 1/ 0.
Thus, they need three cycles to execute. On the plus side, the use of fr12 can be paired
with the fsts if it has a group other than LS and CO. The other open issue slots
might also be paired, although the probability seems lower. On average, I estimate
that the cost is comparable to 4.5 general purpose reg-reg moves, giving an SH4 cost
of 9 for the execution time.

A further factor to consider is the register usage. We need fpul as a spill register -
making scheduling awkward, and decreasing the likelyhood to reuse another value
left there - and a general purpose register, thus increasing the register pressure.
scheduling costs adjustments don't matter for -Os, but they do for speed optimization.
When individual instructions have high latencies which gives them a scheduling
penalty, a group of instruction might have a lower cost then the sum of the cost
of the individual instructions when the group schedules them exceptionally well.

fpul does not often hold values useful to re-use, and a single scratch register is
quite often needed anyway. The way the fpul use is placed we are not likely
to see any significant scheduling penalty - some opportunities are lost, but others
gained. So I think +1 - i.e. 1/2 the cost of a reg-reg move - is a suitable modifier
for the added register usage, both at -Os and for speed optimization.
this gives us a cost of 10 for (TARGET_HARD_SH4 && !optimize_size), 7

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]