This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] Account for prologue spills in reg_pressure scheduling
- From: Vladimir Makarov <vmakarov at redhat dot com>
- To: Maxim Kuvyrkov <maxim dot kuvyrkov at linaro dot org>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Sandiford <rdsandiford at googlemail dot com>, Sebastian Pop <sebpop at gmail dot com>
- Date: Wed, 22 Oct 2014 10:45:55 -0400
- Subject: Re: [PATCH] Account for prologue spills in reg_pressure scheduling
- Authentication-results: sourceware.org; auth=none
- References: <FE7CEF4D-A27B-46D0-96B3-3569C1558DF9 at linaro dot org> <54467AC2 dot 6040801 at redhat dot com> <2F0212D5-C20E-4794-89DC-859A24D7D4AD at linaro dot org>
On 2014-10-22 2:17 AM, Maxim Kuvyrkov wrote:
On Oct 22, 2014, at 4:24 AM, Vladimir Makarov <firstname.lastname@example.org> wrote:
On 10/20/2014 02:57 AM, Maxim Kuvyrkov wrote:
This patch improves register pressure scheduling (both SCHED_PRESSURE_WEIGHTED and SCHED_PRESSURE_MODEL) to better estimate number of available registers.
At the moment the scheduler does not account for spills in the prologues and restores in the epilogue, which occur from use of call-used registers. The current state is, essentially, optimized for case when there is a hot loop inside the function, and the loop executes significantly more often than the prologue/epilogue. However, on the opposite end, we have a case when the function is just a single non-cyclic basic block, which executes just as often as prologue / epilogue, so spills in the prologue hurt performance as much as spills in the basic block itself. In such a case the scheduler should throttle-down on the number of available registers and try to not go beyond call-clobbered registers.
The patch uses basic block frequencies to balance the cost of using call-used registers for intermediate cases between the two above extremes.
The motivation for this patch was a floating-point testcase on arm-linux-gnueabihf (ARM is one of the few targets that use register pressure scheduling by default).
A "thanks" goes to Richard good discussion of the problem and suggestions on the approach to fix it.
The patch was bootstrapped on x86_64-linux-gnu (which doesn't really exercises the patch), and cross-tested on arm-linux-gnueabihf and aarch64-linux-gnu.
OK to apply?
It is a pretty interesting idea for heuristic, Maxim.
But I don't understand the following loop:
+ for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
+ if (call_used_regs[i])
+ for (c = 0; c < ira_pressure_classes_num; ++c)
+ enum reg_class cl = ira_pressure_classes[c];
+ if (ira_class_hard_regs[cl][i])
ira_class_hard_regs[cl] is array containing hard registers belonging to
class CL. So if GENERAL_REGS consists of hard regs 0..3, 12..15, the
array will contain 8 elements 0..3, 12..15. The array size is defined
by ira_class_hard_regs_num[cl]. So the index is order number of hard
reg in the class (starting from 0) but not hard register number itself.
Also the pressure classes never intersect so you can stop the inner loop
when you find class to which hard reg belongs to.
Thanks for spotting this. Indeed, this is a bug, but it still happened to correctly calculate numbers of call-used register for ARM (where I debugged the implementation).
I believe you should rewrite the code and get performance results again
to get an approval.
Sebastian, could you run the geekbench again to make sure you see same performance numbers?
I guess no need to this as the call-used register numbers were correctly
calculated before for aarch64. It means we will have the same results
and the results as I read were good and promising to implement this
heuristic in GCC.
You also missed the changelog.
The changelog was in the separate file. Also attached here with the fixed patch. Bootstrapped on x86_64-linux-gnu. Bootstrap and regtest on arm-linux-gnueabihf is in progress.
Richard proposed 2-loops solution instead of 3. So I'd definitely
The patch is ok with Richard's 2-loops proposal. Thanks, Maxim. As I
wrote the heuristic is an interesting one.