This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Account for prologue spills in reg_pressure scheduling


On Oct 22, 2014, at 4:24 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:

> On 10/20/2014 02:57 AM, Maxim Kuvyrkov wrote:
>> Hi,
>> 
>> This patch improves register pressure scheduling (both SCHED_PRESSURE_WEIGHTED and SCHED_PRESSURE_MODEL) to better estimate number of available registers.
>> 
>> At the moment the scheduler does not account for spills in the prologues and restores in the epilogue, which occur from use of call-used registers.  The current state is, essentially, optimized for case when there is a hot loop inside the function, and the loop executes significantly more often than the prologue/epilogue.  However, on the opposite end, we have a case when the function is just a single non-cyclic basic block, which executes just as often as prologue / epilogue, so spills in the prologue hurt performance as much as spills in the basic block itself.  In such a case the scheduler should throttle-down on the number of available registers and try to not go beyond call-clobbered registers.
>> 
>> The patch uses basic block frequencies to balance the cost of using call-used registers for intermediate cases between the two above extremes.
>> 
>> The motivation for this patch was a floating-point testcase on arm-linux-gnueabihf (ARM is one of the few targets that use register pressure scheduling by default).
>> 
>> A "thanks" goes to Richard good discussion of the problem and suggestions on the approach to fix it.
>> 
>> The patch was bootstrapped on x86_64-linux-gnu (which doesn't really exercises the patch), and cross-tested on arm-linux-gnueabihf and aarch64-linux-gnu.
>> 
>> OK to apply?
>> 
> It is a pretty interesting idea for heuristic, Maxim.
> 
> But I don't understand the following loop:
> 
> +      for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
> +	if (call_used_regs[i])
> +	  for (c = 0; c < ira_pressure_classes_num; ++c)
> +	    {
> +	      enum reg_class cl = ira_pressure_classes[c];
> +	      if (ira_class_hard_regs[cl][i])
> +		++call_used_regs_num[cl];
> 
> 
> ira_class_hard_regs[cl] is array containing hard registers belonging to
> class CL.  So if GENERAL_REGS consists of hard regs 0..3, 12..15,  the
> array will contain 8 elements 0..3, 12..15.  The array size is defined
> by ira_class_hard_regs_num[cl].  So the index is order number of hard
> reg in the class (starting from 0) but not hard register number itself. 
> Also the pressure classes never intersect so you can stop the inner loop
> when you find class to which hard reg belongs to.

Thanks for spotting this.  Indeed, this is a bug, but it still happened to correctly calculate numbers of call-used register for ARM (where I debugged the implementation).

> 
> I believe you should rewrite the code and get performance results again
> to get an approval.

Sebastian, could you run the geekbench again to make sure you see same performance numbers?

>   You also missed the changelog.
> 

The changelog was in the separate file.  Also attached here with the fixed patch.  Bootstrapped on x86_64-linux-gnu.  Bootstrap and regtest on arm-linux-gnueabihf is in progress.

--
Maxim Kuvyrkov
www.linaro.org


Attachment: 0001-sched_class_reg_num.ChangeLog
Description: Binary data

Attachment: 0001-sched_class_reg_num.patch
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]